v 16 v 17 v 21 v 22 v 23

Size: px
Start display at page:

Download "v 16 v 17 v 21 v 22 v 23"

Transcription

1 SchedulingMultithreadedComputations byworkstealing TheUniversityofTexasatAustin CharlesE.Leiserson RobertD.Blumofe MITLaboratoryforComputerScience ticalmethodofschedulingthiskindofdynamicmimd-stylecomputationis\work structured)multithreadedcomputationsonparallelcomputers.apopularandprac- Thispaperstudiestheproblemofecientlyschedulingfullystrict(i.e.,well- Abstract multithreadedcomputationswithdependencies. stealing,"inwhichprocessorsneedingworkstealcomputationalthreadsfromother processors.inthispaper,wegivetherstprovablygoodwork-stealingschedulerfor theminimumexecutiontimewithaninnitenumberofprocessors.moreover,the computationonpprocessorsusingourwork-stealingschedulerist1=p+o(t1),where T1istheminimumserialexecutiontimeofthemultithreadedcomputationandT1is Specically,ouranalysisshowsthattheexpectedtimetoexecuteafullystrict atmosto(pt1(1+nd)smax),wheresmaxisthesizeofthelargestactivationrecordof requirement.wealsoshowthattheexpectedtotalcommunicationofthealgorithmis anythreadandndisthemaximumnumberoftimesthatanythreadsynchronizeswith spacerequiredbytheexecutionisatmosts1p,wheres1istheminimumserialspace threeoftheseboundsareexistentiallyoptimaltowithinaconstantfactor. schedulersaremorecommunicationecientthantheirwork-sharingcounterparts.all itsparent.thiscommunicationboundjustiesthefolkwisdomthatwork-stealing 1Forecientexecutionofadynamicallygrowing\multithreaded"computationonaMIMD- styleparallelcomputer,aschedulingalgorithmmustensurethatenoughthreadsareactive Introduction ofconcurrentlyactivethreadsremainswithinreasonablelimitssothatmemoryrequirements concurrentlytokeeptheprocessorsbusy.simultaneously,itshouldensurethatthenumber arenotundulylarge.moreover,theschedulershouldalsotrytomaintainrelatedthreads andwassupportedinpartbyanarpahigh-performancecomputinggraduatefellowship ThisresearchwasdonewhileRobertD.BlumofewasattheMITLaboratoryforComputerScience ThisresearchwassupportedinpartbytheAdvancedResearchProjectsAgencyunderContractN

2 computations:worksharingandworkstealing.inworksharing,wheneveraprocessor Needlesstosay,achievingallthesegoalssimultaneouslycanbedicult. onthesameprocessor,ifpossible,sothatcommunicationbetweenthemcanbeminimized. generatesnewthreads,theschedulerattemptstomigratesomeofthemtootherprocessors inhopesofdistributingtheworktounderutilizedprocessors.inworkstealing,however, Twoschedulingparadigmshavearisentoaddresstheproblemofschedulingmultithreaded underutilizedprocessorstaketheinitiative:theyattemptto\steal"threadsfromother processors.intuitively,themigrationofthreadsoccurslessfrequentlywithworkstealing byawork-stealingscheduler,butthreadsarealwaysmigratedbyawork-sharingscheduler. thanwithworksharing,sincewhenallprocessorshaveworktodo,nothreadsaremigrated communication.sincethen,manyresearchershaveimplementedvariantsonthisstrategy Theseauthorspointouttheheuristicbenetsofworkstealingwithregardstospaceand allelexecutionoffunctionalprograms[16]andhalstead'simplementationofmultilisp[30]. Thework-stealingideadatesbackatleastasfarasBurtonandSleep'sresearchonpar- search.recently,zhangandortynski[48]haveobtainedgoodboundsonthecommunication [11,21,23,29,34,37,46].Rudolph,Slivkin-Allalouf,andUpfal[43]analyzedarandomizedwork-stealingstrategyforloadbalancingindependentjobsonaparallelcomputer,and KarpandZhang[33]analyzedarandomizedwork-stealingstrategyforparallelbacktrack requirementsofthisalgorithm. aswellasdataowcomputations[2]inwhichthreadsmaystallduetoadatadependency. strict"(well-structured)multithreadedcomputations.thisclassofcomputationsencompassesbothbacktracksearchcomputations[33,48]anddivide-and-conquercomputations[47], Inthispaper,wepresentandanalyzeawork-stealingalgorithmforscheduling\fully Weanalyzeouralgorithmsinastringentatomic-accessmodelsimilartotheatomicmessagepassingmodelof[36]inwhichconcurrentaccessestothesamedatastructureareserially queuedbyanadversary. multithreadedcomputationswhichisprovablyecientintermsoftime,space,andcommunication.weprovethattheexpectedtimetoexecuteafullystrictcomputationonp processorsusingourwork-stealingschedulerist1=p+o(t1),wheret1istheminimum Ourmaincontributionisarandomizedwork-stealingschedulingalgorithmforfullystrict timewithaninnitenumberofprocessors.inaddition,thespacerequiredbytheexecution isatmosts1p,wheres1istheminimumserialspacerequirement.theseboundsarebetterthanpreviousboundsforwork-sharingschedulers[10],andthework-stealingscheduler serialexecutiontimeofthemultithreadedcomputationandt1istheminimumexecution ismuchsimplerandeminentlypractical.partofthisimprovementisduetoourfocusingonfullystrictcomputations,ascomparedtothe(general)strictcomputationsstudied O(PT1(1+nd)Smax),whereSmaxisthesizeofthelargestactivationrecordofanythread andndisthemaximumnumberoftimesthatanythreadsynchronizeswithitsparent.this boundisexistentiallytighttowithinaconstantfactor,meetingthelowerboundofwu andkung[47]forcommunicationinparalleldivide-and-conquer.incontrast,work-sharing in[10].wealsoprovethattheexpectedtotalcommunicationoftheexecutionisatmost requirementsofparallelcomputations.cullerandarvind[19]andruggieroandsargeant schedulershavenearlyworst-casebehaviorforcommunication.thus,ourresultsbolsterthe folkwisdomthatworkstealingissuperiortoworksharing. Othershavestudiedandcontinuetostudytheproblemofecientlymanagingthespace 2

3 [44]giveheuristicsforlimitingthespacerequiredbydataowprograms.Burton[14]shows andanalyzedschedulingalgorithmswithprovablygoodtimeandspacebounds.itisnot spacebounds.blelloch,gibbons,matias,andnarlikar[3,4]havealsorecentlydeveloped Burton[15]hasdevelopedandanalyzedaschedulingalgorithmwithprovablygoodtimeand howtolimitspaceincertainparallelcomputationswithoutcausingdeadlock.morerecently, theoreticmodelofmultithreadedcomputationsintroducedin[10],whichprovidesatheo- reticalbasisforanalyzingschedulers.section3givesasimpleschedulingalgorithmwhich Theremainderofthispaperisorganizedasfollows.InSection2wereviewthegraph- yetclearwhetheranyofthesealgorithmsareaspracticalasworkstealing. usesacentralqueue.this\busy-leaves"algorithmformsthebasisforourrandomizedworkstealingalgorithm,whichwepresentinsection4.insection5weintroducetheatomic-access modelthatweusetoanalyzeexecutiontimeandcommunicationcostsforthework-stealing andcommunicationcostofthework-stealingalgorithm.toconclude,insection7webriey boundalongwithadelay-sequenceargument[41]insection6toanalyzetheexecutiontime algorithm,andwepresentandanalyzeacombinatorial\ballsandbins"gamethatweuse discusshowthetheoreticalideasinthispaperhavebeenappliedtothecilkprogramming toderiveaboundonthecontentionthatarisesinrandomworkstealing.wethenusethis languageandruntimesystem[8,25],aswellasmakesomeconcludingremarks. 2Thissectionreprisesthegraph-theoreticmodelofmultithreadedcomputationintroduced in[10].wealsodenewhatitmeansforcomputationstobe\fullystrict."weconclude Amodelofmultithreadedcomputation withastatementofthegreedy-schedulingtheorem,whichisanadaptationoftheoremsby Brent[13]andGraham[27,28]ondagscheduling. quentialorderingofunit-timeinstructions.theinstructionsareconnectedbydependency edges,whichprovideapartialorderingonwhichinstructionsmustexecutebeforewhich otherinstructions.infigure1,forexample,eachshadedblockisathreadwithcircles Amultithreadedcomputationiscomposedofasetofthreads,eachofwhichisase- representinginstructionsandthehorizontaledges,calledcontinueedges,representingthe sequentialordering.thread 5ofthisexamplecontains3instructions:v10,v11,andv12. usetostorethevaluesonwhichtheycompute. itachunkofmemory,calledanactivationframe,thattheinstructionsofthethreadcan Theinstructionsofathreadmustexecuteinthissequentialorderfromtherst(leftmost) instructiontothelast(rightmost)instruction.inordertoexecuteathread,weallocatefor processorsofap-processorparallelcomputerexecutewhichinstructionsateachstep.an executionscheduledependsontheparticularmultithreadedcomputationandthenumberp ofprocessors.inanygivenstepofanexecutionschedule,eachprocessorexecutesatmost AP-processorexecutionscheduleforamultithreadedcomputationdetermineswhich rentlywiththespawnedthread.weconsiderspawnedthreadstobechildrenofthethread ingathreadislikeasubroutinecall,exceptthatthespawningthreadcanoperateconcur- oneinstruction. thatdidthespawning,andathreadmayspawnasmanychildrenasitdesires.inthisway, Duringthecourseofitsexecution,athreadmaycreate,orspawn,otherthreads.Spawn- 3

4 Γ 1 v 1 v 2 v 16 v 17 v 21 v 22 v 23 Γ 2 Γ 6 Figure1:Amultithreadedcomputation.Thiscomputationcontains23instructionsv1;v2;:::;v23 v 3 v 6 v 9 v 13 v 14 v v 18 v 19 v and6threads 1; 2;:::; Γ 3 Γ 4 Γ 5 v 4 v 5 v 7 v 8 v 10 v 11 v dren.thespawntreeistheparallelanalogofacalltree.inourexamplecomputation,the spawntree'srootthread 1hastwochildren, 2and 6,andthread 2hasthreechildren, threadsareorganizedintoaspawntreeasindicatedinfigure1bythedownward-pointing, shadeddependencyedges,calledspawnedges,thatconnectthreadstotheirspawnedchil- 12 executionschedulemustobeythisedgeinthatnoprocessormayexecuteaninstructionin thespawnoperation intheparentthreadtotherstinstructionofthechildthread.an 3, 4,and 5.Threads 3, 4, 5,and 6,whichhavenochildren,areleafthreads. aspawnedchildthreaduntilafterthespawninginstructionintheparentthreadhasbeen Eachspawnedgegoesfromaspecicinstruction theinstructionthatactuallydoes v7cannotbeexecuteduntilafterthespawninginstructionv6.consistentwithourunit-time instructionexecutes,itallocatesanactivationframeforthenewchildthread.onceathread modelofinstructions,asingleinstructionmayspawnatmostonechild.whenthespawning executed.inourexamplecomputation(figure1),duetothespawnedge(v6;v7),instruction Whenthelastinstructionofathreadexecutes,itdeallocatesitsframeandthethreaddies. bycontinueandspawnedges.consideraninstructionthatproducesadatavaluetobe hasbeenspawnedanditsframehasbeenallocated,wesaythethreadisaliveorliving. consumedbyanotherinstruction.suchaproducer/consumerrelationshipprecludesthe consuminginstructionfromexecutinguntilaftertheproducinginstruction.toenforce Anexecutionschedulegenerallyrespectsotherdependenciesbesidesthoserepresented suchorderings,otherdependencyedges,calledjoinedges,mayberequired,asshownin beforetheproducinginstructionhasexecuted,executionoftheconsumingthreadcannot continue thethreadstalls.oncetheproducinginstructionexecutes,thejoindependencyis Figure1bythecurvededges.Iftheexecutionofathreadarrivesataconsuminginstruction resolutionanddetectioncanbeaccomplishedusingmechanismssuchasjoincounters[8], ready.amultithreadedcomputationdoesnotmodelthemeansbywhichjoindependencies getresolvedorbywhichunresolvedjoindependenciesgetdetected.inimplementation, resolved,whichenablestheconsumingthreadtoresumeitsexecution thethreadbecomes futures[30],ori-structures[2]. instructionhasatmostaconstantnumberofjoinedgesincidentonit.thisassumption Wemaketwotechnicalassumptionsregardingjoinedges.Werstassumethateach 4

5 isconsistentwithourunit-timemodelofinstructions.thesecondassumptionisthatno continuestobereadytoexecuteforatleastonemoreinstruction. joinedgesentertheinstructionimmediatelyfollowingaspawn.thisassumptionmeans thatwhenaparentthreadspawnsachildthread,theparentcannotimmediatelystall.it inthisgraphhavebeenexecuted.sothatexecutionschedulesexist,thisgraphmustbe andnoprocessormayexecuteaninstructionuntilafteralloftheinstruction'spredecessors edgesofthecomputation.thesedependencyedgesformadirectedgraphofinstructions, Anexecutionschedulemustobeytheconstraintsgivenbythespawn,continue,andjoin executed. executionschedule,aninstructionisreadyifallofitspredecessorsinthedaghavebeen acyclic.thatis,itmustbeadirectedacyclicgraph,ordag.atanygivenstepofan frameshavebeendeallocated.althoughthisassumptionisnotabsolutelynecessary,itgives childrendie,andthus,athreaddoesnotdeallocateitsactivationframeuntilallitschildren's theexecutionanaturalstructure,anditwillsimplifyouranalysesofspaceutilization.in Wemakethesimplifyingassumptionthataparentthreadremainsaliveuntilallits (orifsuchstorageisavailable,thenwedonotaccountforit).therefore,thespaceused thecomputation;thereisnoglobalstorageavailabletothecomputationoutsidetheframes accountingforspaceutilization,wealsoassumethattheframesholdallthevaluesusedby threadsatthattime,andthetotalspaceusedinexecutingacomputationisthemaximum atagiventimeinexecutingacomputationisthetotalsizeofallframesusedbyallliving suchvalueoverthecourseoftheexecution. activationframeisallocatedandthisframeremainsallocatedaslongasthethreadremains nectedbydependencyedges.theinstructionsareconnectedbycontinueedgesintothreads, andthethreadsformaspawntreewiththespawnedges.whenathreadisspawned,an Tosummarize,amultithreadedcomputationcanbeviewedasadagofinstructionscon- alive.alivingthreadmaybeeitherreadyorstalledduetoanunresolveddependency. thanonemultithreadedcomputation.inthatcase,wesaytheprogramisnondeterministic.ifthesamemultithreadedcomputationisgeneratedbytheprogramontheinput Agivenmultithreadedprogramwhenrunonagiveninputcansometimesgeneratemore nomatterhowthecomputationisscheduled,thentheprogramisdeterministic.inthis cally,weshallnotworryabouthowthemultithreadedcomputationisgenerated.instead, weshallstudyitspropertiesinanaposteriorifashion. paper,weshallanalyzemultithreadedcomputations,notmultithreadedprograms.speci- thekindsofsyncrhonizationsthatcanoccurarerestricted.astrictmultithreadedcomputationisoneinwhichalljoinedgesfromathreadgotoanancestorofthethreadin Becausemultithreadedcomputationswitharbitrarydependenciescanbeimpossibleto scheduleeciently[10],westudysubclassesofgeneralmultithreadedcomputationsinwhich theactivationtree.inastrictcomputation,theonlyedgeintoasubtree(emanatingfrom itsargumentsareavailable,althoughtheargumentscanbegarneredinparallel.afully spawnedge(v2;v3).thus,strictnessmeansthatathreadcannotbeinvokedbeforeallof thecomputationoffigure1isstrict,andtheonlyedgeintothesubtreerootedat 2isthe outsidethesubtree)isthespawnedgethatspawnsthesubtree'srootthread.forexample, strictcomputationisoneinwhichalljoinedgesfromathreadgotothethread'sparent.a fullystrictcomputationis,inasense,a\well-structured"computation,inthatalljoinedges fromasubtree(ofthespawntree)emanatefromthesubtree'sroot.theexamplecompu- 5

6 tationoffigure1isfullystrict.anymultithreadedcomputationthatcanbeexecutedina depth-rstmanneronasingleprocessorcanbemadeeitherstrictorfullystrictbyaltering thedependencystructure,possiblyaectingtheachievableparallelism,butnotaectingthe semanticsofthecomputation[5]. lengthtobethelengthofalongestdirectedpathinthedag.ourexamplecomputation workofthecomputationtobethetotalnumberofinstructionsandthecritical-path computerintermsofthecomputation's\work"and\critical-pathlength."wedenethe WequantifyandboundtheexecutiontimeofacomputationonaP-processorparallel (Figure1)haswork23andcritical-pathlength10.Foragivencomputation,letT(X)denote thetimetoexecutethecomputationusingp-processorexecutionschedulex,andlet denotetheminimumexecutiontimewithpprocessors theminimumbeingtakenoverallpprocessorexecutionschedulesforthecomputation.thent1istheworkofthecomputation, TP=min XT(X) sincea1-processorcomputercanonlyexecuteoneinstructionateachstep,andt1isthe critical-pathlength,sinceevenwitharbitrarilymanyprocessors,eachinstructiononapath mustexecuteserially.noticethatwemusthavetpt1=p,becausepprocessorscan executeonlypinstructionspertimestep,andofcourse,wemusthavetpt1. provedin[10,20],extendstheseresultsminimallytoshowthatthisupperboundontpcan thisupperboundisuniversallyoptimaltowithinafactorof2.thefollowingtheorem, processorexecutionschedulesxwitht(x)t1=p+t1.asthesumoftwolowerbounds, EarlyworkondagschedulingbyBrent[13]andGraham[27,28]showsthatthereexistP- ready,thenallexecute. Pinstructionsareready,thenPinstructionsexecute,andiffewerthanPinstructionsare beobtainedbygreedyschedules:thoseinwhichateachstepoftheexecution,ifatleast executionschedulexachievest(x)t1=p+t1. T1andcritical-pathlengthT1,andforanynumberPofprocessors,anygreedyP-processor Theorem1(Thegreedy-schedulingtheorem)Foranymultithreadedcomputationwithwork Generally,weareinterestedinschedulesthatachievelinearspeedup,thatisT(X)= O(T1=P).Foragreedyschedule,linearspeedupoccurswhentheparallelism,whichwe denetobet1=t1,satisest1=t1=(p). stackdepthofathreadtobethesumofthesizesoftheactivationframesofallitsancestors, includingitself.thestackdepthofamultithreadedcomputationisthemaximumstack depthofanyofitsthreads.weshalldenotebys1theminimumamountofspacepossiblefor Toquantifythespaceusedbyagivenexecutionscheduleofacomputation,wedenethe any1-processorexecutionofamultithreadedcomputation,whichisequaltothestackdepth ofthecomputation.lets(x)denotethespaceusedbyap-processorexecutionschedule Xofamultithreadedcomputation.Weshallbeinterestedinthoseexecutionschedulesthat exhibitatmostlinearexpansionofspace,thatis,s(x)=o(s1p),whichisexistentially optimaltowithinaconstantfactor[10]. 6

7 Onceathread hasbeenspawnedinastrictcomputation,asingleprocessorcancomplete 3theexecutionoftheentiresubcomputationrootedat evenifnootherprogressismade Thebusy-leavesproperty stall.asweshallsee,thispropertyallowsanexecutionscheduletokeeptheleaves\busy." at thatisready.inparticular,noleafthreadinastrictmultithreadedcomputationcan untilthetime dies,thereisalwaysatleastonethreadfromthesubcomputationrooted onotherpartsofthecomputation.inotherwords,fromthetimethethread isspawned computationwithworkt1,critical-pathlengtht1,andstackdepths1,thereexistsapprocessorexecutionschedulexthatachievestimet(x)t1=p+t1andspaces(x)s1p Inthissection,weshowthatforanynumberPofprocessorsandanystrictmultithreaded Bycombiningthis\busy-leaves"propertywiththegreedyproperty,wederiveexecution schedulesthatsimultaneouslyexhibitlinearspeedupandlinearexpansionofspace. simultaneously.wegiveasimpleonlinep-processorparallelalgorithm thebusy-leaves thealgorithmhascomputedandexecutedtherstt 1stepsoftheexecutionschedule. randomizedwork-stealingalgorithmpresentedinsection4. Algorithm tocomputesuchaschedule.thissimplealgorithmwillformthebasisforthe revealedsofarintheexecutiontocomputeandexecutethetthstepoftheschedule.in Atthetthstep,thealgorithmusesonlyinformationfromtheportionofthecomputation TheBusy-LeavesAlgorithmoperatesonlineinthefollowingsense.Beforethetthstep, particular,itdoesnotuseanyinformationfrominstructionsnotyetexecutedorthreadsnot yetspawned. ThoughwedescribethealgorithmasaP-processorparallelalgorithm,weshallnotanalyzeit thisglobalpool,andwhenaprocessorneedswork,itremovesareadythreadfromthepool. isuniformlyavailabletoallpprocessors.whenspawnsoccur,newthreadsareaddedto TheBusy-LeavesAlgorithmmaintainsalllivingthreadsinasinglethreadpoolwhich contendingforaccesstothepool.infact,weshallonlyanalyzepropertiesoftheschedule itselfandignorethecostincurredbythealgorithmincomputingtheschedule.(scheduling assuch.specically,incomputingthetthstepoftheschedule,wealloweachprocessortoadd threadstothethreadpoolanddeletethreadsfromit.thus,weignoretheeectsofprocessors processoreitherisidleorhasathreadtoworkon.thoseprocessorsthatareidlebeginthe threadintheglobalthreadpoolandallprocessorsidle.atthebeginningofeachstep,each overheadswillbeanalyzedfortherandomizedwork-stealingalgorithm,however.) stepbyattemptingtoremoveanyreadythreadfromthepool.iftherearesucientlymany TheBusy-LeavesAlgorithmoperatesasfollows.Thealgorithmbeginswiththeroot readythreadsinthepooltosatisfyalloftheidleprocessors,theneveryidleprocessorgets thathasathreadtoworkonexecutesthenextinstructionfromthatthread.ingeneral, areadythreadtoworkon.otherwise,someprocessorsremainidle.then,eachprocessor tothefollowingrules. onceaprocessorhasathread,callit a,toworkon,itexecutesaninstructionfrom aat eachstepuntilthethreadeitherspawns,stalls,ordies,inwhichcase,itperformsaccording ➊Spawns:Ifthethread aspawnsachild b,thentheprocessornishesthecurrent stepbyreturning atothethreadpool.theprocessorbeginsthenextstepworking on b. 7

8 step threadpool processoractivity 321 1:v1 2:v3 p1v2 1:v16 p :v4 2:v6 4:v7 v5 6:v18 v17 v :v9 5:v10 v8 1:v21 2:v13 v :v15 1:v23 v11 v12 1:v22 v14 workedonandtheinstructionexecutedbyeachofthe2processors,p1andp2,ateachstep.living justaftereachidleprocessorhasremovedareadythread.italsoliststhereadythreadbeing putationoffigure1.thisscheduleliststhelivingthreadsintheglobalthreadpoolateachstep Figure2:A2-processorexecutionschedulecomputedbytheBusy-LeavesAlgorithmforthecom- threadsthatarereadyarelistedinbold.theotherlivingthreadsarestalled. ➋Stalls:Ifthethread astalls,thentheprocessornishesthecurrentstepbyreturning ➌Dies:Ifthethread adies,thentheprocessornishesthecurrentstepbycheckingto atothethreadpool.theprocessorbeginsthenextstepidle. idle. andnootherprocessorisworkingon b,thentheprocessortakes bfromthepool andbeginsthenextstepworkingon b.otherwise,theprocessorbeginsthenextstep seeif a'sparentthread bcurrentlyhasanylivingchildren.if bhasnolivechildren thebusy-leavesalgorithmonthecomputationoffigure1.rule➊:atstep2,processor p1workingonthread 1executesv2whichspawnsthechild 2,sop1places 1backinthe pool(tobepickedupatthebeginningofthenextstepbytheidlep2)andbeginsthenext Figure2illustratesthesethreerulesina2-processorexecutionschedulecomputedby 2executesv15and 2dies,sop1retrievestheparent 1fromthepoolandbeginsthenext 1stalls,sop2returns 1tothepoolandbeginsthenextstepidle(andremainsidlesince stepworkingon 2.Rule➋:Atstep8,processorp2workingonthread 1executesv21and stepworkingon 1. thethreadpoolcontainsnoreadythreads).rule➌:atstep13,processorp1workingon spawnsubtreeatanytimestepttobetheportionofthespawntreeconsistingofjust execution,everyleafinthe\spawnsubtree"hasaprocessorworkingonit.wedenethe LeavesAlgorithmmaintainsthebusy-leavesproperty:ateverytimestepduringthe Besidesbeinggreedy,foranystrictcomputation,theschedulecomputedbytheBusy- 8

9 thosethreadsthatarealiveatstept.torestatethebusy-leavesproperty,ateverytimestep, property,buteverystrictmultithreadedcomputationdoes.webeginbyshowingthatany nowprovethisfactandshowthatitimplieslinearexpansionofspace.itisworthnoting thatnoteverymultithreadedcomputationhasaschedulethatmaintainsthebusy-leaves everylivingthreadthathasnolivingdescendantshasaprocessorworkingonit.weshall schedulethatmaintainsthebusy-leavespropertyexhibitslinearexpansionofspace. Proof: schedulexthatmaintainsthebusy-leavespropertyusesspaceboundedbys(x)s1p. Lemma2ForanymultithreadedcomputationwithstackdepthS1,anyP-processorexecution andtherefore,thespaceinuseatanytimesteptisatmosts1p. mostpleaves.foreachsuchleaf,thespaceusedbyitandallofitsancestorsisatmosts1, Forschedulesthatmaintainthebusy-leavesproperty,theupperboundS1Pisconser- Thebusy-leavespropertyimpliesthatatalltimestepst,thespawnsubtreehasat vative.bychargings1spaceforeachbusyleaf,wemaybeovercharging.forsomecom- putations,byknowingthattheschedulepreservesthebusy-leavesproperty,wecanappeal directlytothefactthatthespawnsubtreeneverhasmorethanpleavestoobtaintight boundsonspaceusage[6]. Theorem3ForanynumberPofprocessorsandanystrictmultithreadedcomputationwith computesaschedulethatisbothgreedyandmaintainsthebusy-leavesproperty. Wenishthissectionbyshowingthatforstrictcomputations,theBusy-LeavesAlgorithm whosespacesatisess(x)s1p. ap-processorexecutionschedulexwhoseexecutiontimesatisest(x)t1=p+t1and workt1,critical-pathlengtht1,andstackdepths1,thebusy-leavesalgorithmcomputes Lemma2ifwecanshowthattheBusy-LeavesAlgorithmmaintainsthebusy-leavesproperty. Weprovethisfactbyinductiononthenumberofsteps.Attherststepofthealgorithm,the Proof: sincethebusy-leavesalgorithmcomputesagreedyschedule.thespaceboundfollowsfrom Thetimeboundfollowsdirectlyfromthegreedy-schedulingtheorem(Theorem1), eitherspawns,stalls,ordies.rule➊:if aspawnsachild b,then aisnotaleaf(evenifit aprocessorhasathread atoworkon,itexecutesinstructionsfromthatthreaduntilit onit.wemustshowthatallofthealgorithmrulespreservethebusy-leavesproperty.when spawnsubtreecontainsjusttherootthreadwhichisaleaf,andsomeprocessorisworking mayturnintoaleaf.inthiscase,theprocessorworkson bunlesssomeotherprocessor wasbefore)and bisaleaf.inthiscase,theprocessorworkson b,sothenewleafisbusy. alreadyis,sothenewleafisguaranteedtobebusy. Rule➋:If astalls,then acannotbealeafsinceinastrictcomputation,theunresolved dependencymustcomefromadescendant.rule➌:if adies,thenitsparentthread b ecientexecutionschedulesanddoesoperateonline,itsurelydoesnotdosoeciently, mustbecomputedecientlyonline,andthoughthebusy-leavesalgorithmdoescompute schedule,andweknowhowtondit.butthesefactstakeusonlysofar.executionschedules Wenowknowthateverystrictmultithreadedcomputationhasanecientexecution andinthefollowingsections,weprovethatitisbothecientandscalable. contendforaccess.inthenextsection,wepresentadistributedonlineschedulingalgorithm, isaconsequenceofemployingasinglecentralizedthreadpoolatwhichallprocessorsmust exceptpossiblyinthecaseofsmall-scalesymmetricmultiprocessors.thislackofscalability 9

10 4tithreadedcomputationsonaparallelcomputer.Also,wepresentanimportantstructural Inthissection,wepresentanonline,randomizedwork-stealingalgorithmforschedulingmul- Arandomizedwork-stealingalgorithm algorithmcausesatmostalinearexpansionofspace.thislemmareappearsinsection6to lemmawhichisusedattheendofthissectiontoshowthatforfullystrictcomputations,this showthatforfullystrictcomputations,thisalgorithmachieveslinearspeedupandgenerates existentiallyoptimalamountsofcommunication. Algorithmisdistributedacrosstheprocessors.Specically,eachprocessormaintainsaready Threadscanbeinsertedonthebottomandremovedfromeitherend.Aprocessortreats dequedatastructureofthreads.thereadydequehastwoends:atopandabottom. IntheWork-StealingAlgorithm,thecentralizedthreadpooloftheBusy-Leaves migratedtootherprocessorsareremovedfromthetop. deque.itstartsworkingonthethread,callit a,andcontinuesexecuting a'sinstructions itsreadydequelikeacallstack,pushingandpoppingfromthebottom.threadsthatare until aspawns,stalls,dies,orenablesastalledthread,inwhichcase,itperformsaccording tothefollowingrules. Ingeneral,aprocessorobtainsworkbyremovingthethreadatthebottomofitsready ➊Spawns:Ifthethread aspawnsachild b,then aisplacedonthebottomofthe ➋Stalls:Ifthethread astalls,itsprocessorchecksthereadydeque.ifthedeque containsanythreads,thentheprocessorremovesandbeginsworkonthebottommost readydeque,andtheprocessorcommencesworkon b. beginsworkonit.(thiswork-stealingstrategyiselaboratedbelow.) stealsthetopmostthreadfromthereadydequeofarandomlychosenprocessorand thread.ifthereadydequeisempty,however,theprocessorbeginsworkstealing:it ➌Dies:Ifthethread adies,thentheprocessorfollowsrule➋asinthecaseof a ➍Enables:Ifthethread aenablesastalledthread b,thenow-readythread bis placedonthebottomofthereadydequeof a'sprocessor. stalling. rule➍forthecasewhenathreadenablesastalledthread,theserulesareanalogoustothe rulesofthebusy-leavesalgorithm,andasweshallsee,rule➍isneededtoensurethatthe performrule➍forenablingandthenrule➋forstallingorrule➌fordying.exceptfor Athreadcansimultaneouslyenableastalledthreadandstallordie,inwhichcasewerst algorithmmaintainsimportantstructuralproperties,includingthebusy-leavesproperty. themultithreadedcomputationisplacedinthereadydequeofoneprocessor,whiletheother processorsstartworkstealing. TheWork-StealingAlgorithmbeginswithallreadydequesempty.Therootthreadof beginsworkonthetopthread.ifthevictim'sreadydequeisempty,however,thethieftries Thethiefqueriesthereadydequeofthevictim,andifitisnonempty,thethiefremovesand athiefandattemptstostealworkfromavictimprocessorchosenuniformlyatrandom. Whenaprocessorbeginsworkstealing,itoperatesasfollows.Theprocessorbecomes again,pickinganothervictimatrandom. 10

11 Γ k ready deque Γ spawnedachild.thedashededgesarethe\dequeedges"introducedinsection6. Figure3:Thestructureofaprocessor'sreadydeque.Theblackinstructionineachthreadindicates thethread'scurrentlyreadyinstruction.onlythread kmayhavebeenworkedonsinceitlast 2 Γ 1 Γ 0 executing Wenowstateandproveanimportantlemmaonthestructureofthreadsintheready thread timeandcommunication.figure3illustratesthelemma. usedlaterinthissectiontoanalyzeexecutionspaceandinsection6toanalyzeexecution dequeofanyprocessorduringtheexecutionofafullystrictcomputation.thislemmais Lemma4IntheexecutionofanyfullystrictmultithreadedcomputationbytheWork-Stealing thread.let 0bethethreadthatpisworkingon,letkbethenumberofthreadsinp'sready Algorithm,consideranyprocessorpandanygiventimestepatwhichpisworkingona inp'sreadydequesatisfythefollowingproperties: top,sothat 1isthebottommostand kisthetopmost.ifwehavek>0,thenthethreads deque,andlet 1; 2;:::; kdenotethethreadsinp'sreadydequeorderedfrombottomto ➀Fori=1;2;:::;k,thread iistheparentof i 1. Proof: ➁Ifwehavek>1,thenfori=1;2;:::;k 1,thread ihasnotbeenworkedonsince itspawned i 1. processorpexecutesaninstructionfromthread 0.Let 1; 2;:::; kdenotethekthreads therootthreadinsomeprocessor'sreadydequeandallotherreadydequesempty,sothe lemmavacuouslyholdsattheoutset.now,consideranystepofthealgorithmatwhich Theproofisastraightforwardinductiononexecutiontime.Executionbeginswith inp'sreadydequebeforethestep,andsupposethateitherk=0orbothpropertieshold. propertiesholdafterthestep. algorithmandshowthattheyallpreservethelemma.thatis,eitherk0=0orboth denotethek0threadsinp'sreadydequeafterthestep.wenowlookattherulesofthe Let 0denotethethread(ifany)beingworkedonbypafterthestep,andlet 01; 02;:::; 0k0 Property➀:Ifk0>1,thenforj=2;3;:::;k0,thread 0jistheparentof 0j 1,sincebefore andcommencesworkonthechild.thus, 0isthechild,wehavek0=k+1>0,and forj=1;2;:::;k0,wehave 0j= j 1.SeeFigure4.Now,wecancheckbothproperties. Rule➊:If 0spawnsachild,thenppushes 0ontothebottomofthereadydeque thespawnwehavek>0,whichmeansthatfori=1;2;:::;k,thread iistheparentof i 1. 11

12 Moreover, 01isobviouslytheparentof 0.Property➁:Ifk0>2,thenforj=2;3;:::;k0 1, spawnonlyjustoccurred. k>1,whichmeansthatfori=1;2;:::;k 1,thread ihasnotbeenworkedonsinceit thread 0jhasnotbeenworkedonsinceitspawned 0j 1,becausebeforethespawnwehave spawned i 1.Finally,thread 01hasnotbeenworkedonsinceitspawned 0,becausethe Γ k Γ 2 Γ Figure4:Thereadydequeofaprocessorbeforeandafterthethread 0thatitisworkingon 3 Γ 1 Γ spawnsachild.(notethatthethreads 0and 0arenotactuallyinthedeque;theyarethe (a)beforespawn. (b)afterspawn. 2 Γ 0 Γ 1 Γ readydequeisempty,sotheprocessorcommencesworkstealing,andwhentheprocessor threadsbeingworkedonbeforeandafterthespawn.) stealsandbeginsworkonathread,wehavek0=0.ifk>0,thereadydequeisnot empty,sotheprocessorpopsthebottommostthreadothedequeandcommencesworkon Rules➋and➌:If 0stallsordies,thenwehavetwocasestoconsider.Ifk=0,the 0 Forj=1;2;:::;k0,thread 0jistheparentof 0j 1,sincefori=1;2;:::;k,thread iisthe have 0j= j+1.seefigure5.now,ifk0>0,wecancheckbothproperties.property➀: parentof i 1.Property➁:Ifk0>1,thenforj=1;2;:::;k0 1,thread 0jhasnotbeen it.thus,wehave 0= 1(thepoppedthread)andk0=k 1,andforj=1;2;:::;k0,we meansthatfori=2;3;:::;k 1,thread ihasnotbeenworkedonsinceitspawned i 1. workedonsinceitspawned 0j 1,becausebeforethestallordeathwehavek>2,which Γ k Γ k Γ k Γ 2 Γ (Notethatthethreads 0and 0arenotactuallyinthedeque;theyarethethreadsbeingworked Figure5:Thereadydequeofaprocessorbeforeandafterthethread 0thatitisworkingondies. (a)beforedeath. (b)afterdeath. 1 Γ 1 Γ 0 Γ viouslystalledthreadmustbe 0'sparent.First,weobservethatwemusthavek=0.If onbeforeandafterthedeath.) Rule➍:If 0enablesastalledthread,thenduetothefullystrictcondition,thatpre- 0 12

M.S. in Business and Information Systems

M.S. in Business and Information Systems 1 M.S. in Business and Information Systems (30 Credits) M.S. in Business and Information Systems (courses only) Bridge Courses Select one of the following: 3 CS 100 Roadmap to Computing CS 113 Introduction

More information

Elemental functions: Writing data-parallel code in C/C++ using Intel Cilk Plus

Elemental functions: Writing data-parallel code in C/C++ using Intel Cilk Plus Elemental functions: Writing data-parallel code in C/C++ using Intel Cilk Plus A simple C/C++ language extension construct for data parallel operations Robert Geva robert.geva@intel.com Introduction Intel

More information

In Proceedings of Performance Tools 98 Lecture Notes in Computer Science, Vol. 1468, pp. 231-242, September 1998

In Proceedings of Performance Tools 98 Lecture Notes in Computer Science, Vol. 1468, pp. 231-242, September 1998 In Proceedings of Performance Tools 98 OnChoosingaTaskAssignmentPolicyfora Lecture Notes in Computer Science, Vol. 1468, pp. 231-242, September 1998 MorHarchol-Balter?;1,MarkE.Crovella??;2,andCristinaD.Murta???;2

More information

(Amounts are rounded down to the nearest 100 million yen)

(Amounts are rounded down to the nearest 100 million yen) 1 2 3 (Amounts are rounded down to the nearest 100 million yen) 4 5 6 7 8 9 10 11 12 180 8 4 0 66 3.3 3.9 77 9.0 11.1 112 113 11.5 122 9.9 135 166 10.4 120 60 12 13 14 15 16 17 Appendices 18 19 20 21 22

More information

10.3 POWER METHOD FOR APPROXIMATING EIGENVALUES

10.3 POWER METHOD FOR APPROXIMATING EIGENVALUES 58 CHAPTER NUMERICAL METHODS. POWER METHOD FOR APPROXIMATING EIGENVALUES In Chapter 7 you saw that the eigenvalues of an n n matrix A are obtained by solving its characteristic equation n c nn c nn...

More information

Tutorial 8. NP-Complete Problems

Tutorial 8. NP-Complete Problems Tutorial 8 NP-Complete Problems Decision Problem Statement of a decision problem Part 1: instance description defining the input Part 2: question stating the actual yesor-no question A decision problem

More information

An Oracle White Paper July 2012. Load Balancing in Oracle Tuxedo ATMI Applications

An Oracle White Paper July 2012. Load Balancing in Oracle Tuxedo ATMI Applications An Oracle White Paper July 2012 Load Balancing in Oracle Tuxedo ATMI Applications Introduction... 2 Tuxedo Routing... 2 How Requests Are Routed... 2 Goal of Load Balancing... 3 Where Load Balancing Takes

More information

2.0. Specification of HSN 2.0 JavaScript Static Analyzer

2.0. Specification of HSN 2.0 JavaScript Static Analyzer 2.0 Specification of HSN 2.0 JavaScript Static Analyzer Pawe l Jacewicz Version 0.3 Last edit by: Lukasz Siewierski, 2012-11-08 Relevant issues: #4925 Sprint: 11 Summary This document specifies operation

More information

Bachelor of Technology (Computer Engineering.) Scheme of Courses/Examination. (3 rd SEMESTER) 1 HUT-211 Organizational Behaviour 2 1-3 60 40-100 3 2.

Bachelor of Technology (Computer Engineering.) Scheme of Courses/Examination. (3 rd SEMESTER) 1 HUT-211 Organizational Behaviour 2 1-3 60 40-100 3 2. Bachelor of Technology (Computer Engineering.) Scheme of s/examination Sl. (3 rd SEMESTER) Teaching Schedule Examination Schedule 1 HUT-211 Organizational Behaviour 2 1-3 60 40-100 3 2.5 2 COT-201 Programming

More information

Job Scheduling Model. problem scenario: a set of jobs needs to be executed using a single server, on which only one job at a time may run

Job Scheduling Model. problem scenario: a set of jobs needs to be executed using a single server, on which only one job at a time may run Scheduling 1 Job Scheduling Model problem scenario: a set of jobs needs to be executed using a single server, on which only one job at a time may run for theith job, we have an arrival timea i and a run

More information

Chapter 6: CPU Scheduling

Chapter 6: CPU Scheduling Chapter 6: CPU Scheduling Edited by Ghada Ahmed, PhD ghada@fcih.net Silberschatz, Galvin and Gagne 2013 Basic Concepts Maximum CPU utilization obtained with multiprogramming CPU I/O Burst Cycle Process

More information

Multiplicity. Chapter 6

Multiplicity. Chapter 6 Chapter 6 Multiplicity The fundamental theorem of algebra says that any polynomial of degree n 0 has exactly n roots in the complex numbers if we count with multiplicity. The zeros of a polynomial are

More information

Code and Process Migration! Motivation!

Code and Process Migration! Motivation! Code and Process Migration! Motivation How does migration occur? Resource migration Agent-based system Details of process migration Lecture 6, page 1 Motivation! Key reasons: performance and flexibility

More information

Job Scheduling Model

Job Scheduling Model Scheduling 1 Job Scheduling Model problem scenario: a set of jobs needs to be executed using a single server, on which only one job at a time may run for theith job, we have an arrival timea i and a run

More information

MANAGEMENT CONSOLE PERFORMANCE COMPARISON

MANAGEMENT CONSOLE PERFORMANCE COMPARISON MANAGEMENT CONSOLE PERFORMANCE COMPARISON Performance comparison between Blancco Management Console 3.3.2 and 3.6.0 www.blanccotechnologygroup.com TABLE OF CONTENTS 1. Overview... 3 Method... 3 2. Performance

More information

Java SE 7 Programming

Java SE 7 Programming Java SE 7 Programming The second of two courses that cover the Java Standard Edition 7 (Java SE 7) Platform, this course covers the core Application Programming Interfaces (API) you will use to design

More information

Distributed Computing over Communication Networks: Maximal Independent Set

Distributed Computing over Communication Networks: Maximal Independent Set Distributed Computing over Communication Networks: Maximal Independent Set What is a MIS? MIS An independent set (IS) of an undirected graph is a subset U of nodes such that no two nodes in U are adjacent.

More information

Oracle Health Sciences Network Patient Recruiter Cloud Service Services Description Version: 2.0 Effective Date: 01-April-2013

Oracle Health Sciences Network Patient Recruiter Cloud Service Services Description Version: 2.0 Effective Date: 01-April-2013 Oracle Health Sciences Network Patient Recruiter Cloud Service Services Description Version: 2.0 Effective Date: 01-April-2013 Oracle Health Sciences Network Patient Recruiter Cloud Service - Services

More information

Dynamic Thread Pool based Service Tracking Manager

Dynamic Thread Pool based Service Tracking Manager Dynamic Thread Pool based Service Tracking Manager D.V.Lavanya, V.K.Govindan Department of Computer Science & Engineering National Institute of Technology Calicut Calicut, India e-mail: lavanya.vijaysri@gmail.com,

More information

Best Practice - Tomcat Performance Tuning for Pentaho

Best Practice - Tomcat Performance Tuning for Pentaho Best Practice - Tomcat Performance Tuning for Pentaho This page has intentionally been left blank. Contents Overview... 1 Tuning Tomcat for Performance... 2 Choose the Right Connector... 2 Set the Right

More information

EECS 750: Advanced Operating Systems. 01/28 /2015 Heechul Yun

EECS 750: Advanced Operating Systems. 01/28 /2015 Heechul Yun EECS 750: Advanced Operating Systems 01/28 /2015 Heechul Yun 1 Recap: Completely Fair Scheduler(CFS) Each task maintains its virtual time V i = E i 1 w i, where E is executed time, w is a weight Pick the

More information

Technology: Enterprise Storage

Technology: Enterprise Storage Accelerated Capital Allowances Eligibility Criteria Category: Information and Communications Technology (ICT) Technology: Enterprise Storage Enterprise Storage equipment is defined as a storage device

More information

CHAPTER 1 INTRODUCTION

CHAPTER 1 INTRODUCTION 1 CHAPTER 1 INTRODUCTION 1.1 MOTIVATION OF RESEARCH Multicore processors have two or more execution cores (processors) implemented on a single chip having their own set of execution and architectural recourses.

More information

Capacity Scheduler Guide

Capacity Scheduler Guide Table of contents 1 Purpose...2 2 Features... 2 3 Picking a task to run...2 4 Installation...3 5 Configuration... 3 5.1 Using the Capacity Scheduler... 3 5.2 Setting up queues...3 5.3 Configuring properties

More information

Bayesian networks - Time-series models - Apache Spark & Scala

Bayesian networks - Time-series models - Apache Spark & Scala Bayesian networks - Time-series models - Apache Spark & Scala Dr John Sandiford, CTO Bayes Server Data Science London Meetup - November 2014 1 Contents Introduction Bayesian networks Latent variables Anomaly

More information

Application. Performance Testing

Application. Performance Testing Application Performance Testing www.mohandespishegan.com شرکت مهندش پیشگان آزمون افسار یاش Performance Testing March 2015 1 TOC Software performance engineering Performance testing terminology Performance

More information

CS416 CPU Scheduling

CS416 CPU Scheduling CS416 CPU Scheduling CS 416: Operating Systems Design, Spring 2011 Department of Computer Science Rutgers University Rutgers Sakai: 01:198:416 Sp11 (https://sakai.rutgers.edu) Assumptions Pool of jobs

More information

TIME VALUE OF MONEY. Return of vs. Return on Investment: We EXPECT to get more than we invest!

TIME VALUE OF MONEY. Return of vs. Return on Investment: We EXPECT to get more than we invest! TIME VALUE OF MONEY Return of vs. Return on Investment: We EXPECT to get more than we invest! Invest $1,000 it becomes $1,050 $1,000 return of $50 return on Factors to consider when assessing Return on

More information

Memory Characterization to Analyze and Predict Multimedia Performance and Power in an Application Processor

Memory Characterization to Analyze and Predict Multimedia Performance and Power in an Application Processor WHITE PAPER Memory Characterization to Analyze and Predict Multimedia Performance and Power in an Application Processor Yu Bai Staff Engineer, APSE Marvell November 2011 www.marvell.com Introduction: Nowadays,

More information

Curriculum. for the Master s degree programme. Applied Informatics. Programme code L 066 911. Effective date: 1 st of October 2013

Curriculum. for the Master s degree programme. Applied Informatics. Programme code L 066 911. Effective date: 1 st of October 2013 APPENDIX 1 to the University Bulletin Issue 20, No. 159.1-2012/2013, 19.06.2013 Curriculum for the Master s degree programme Applied Informatics Programme code L 066 911 Effective date: 1 st of October

More information

CPU Scheduling. Multiprogrammed OS

CPU Scheduling. Multiprogrammed OS CPU Scheduling Multiprogrammed OS Efficient Use of Processor By switching between jobs Thread scheduling and process scheduling often used interchangeably Which is done depends on implementation of O/S.

More information

CASE STUDY: Do Press Releases Work?

CASE STUDY: Do Press Releases Work? CASE STUDY: Do Press Releases Work? Do Press Releases Work? Case Study: Is Press Release USELESS, or USEFUL? Copyright 2014 MarketersMedia - Press Release Distribution All Rights Reserved 2 Introduction:

More information

Deadlock Detection and Recovery!

Deadlock Detection and Recovery! Deadlock Detection and Recovery! Richard M. Fujimoto! Professor!! Computational Science and Engineering Division! College of Computing! Georgia Institute of Technology! Atlanta, GA 30332-0765, USA!! http://www.cc.gatech.edu/~fujimoto/!

More information

Load Balancing & Termination

Load Balancing & Termination Load Load & Termination Lecture 7 Load and Termination Detection Load Load Want all processors to operate continuously to minimize time to completion. Load balancing determines what work will be done by

More information

Math. Rounding Decimals. Answers. 1) Round to the nearest tenth. 8.54 8.5. 2) Round to the nearest whole number. 99.59 100

Math. Rounding Decimals. Answers. 1) Round to the nearest tenth. 8.54 8.5. 2) Round to the nearest whole number. 99.59 100 1) Round to the nearest tenth. 8.54 8.5 2) Round to the nearest whole number. 99.59 100 3) Round to the nearest tenth. 310.286 310.3 4) Round to the nearest whole number. 6.4 6 5) Round to the nearest

More information

Comp 204: Computer Systems and Their Implementation. Lecture 12: Scheduling Algorithms cont d

Comp 204: Computer Systems and Their Implementation. Lecture 12: Scheduling Algorithms cont d Comp 204: Computer Systems and Their Implementation Lecture 12: Scheduling Algorithms cont d 1 Today Scheduling continued Multilevel queues Examples Thread scheduling 2 Question A starvation-free job-scheduling

More information

CPU Scheduling Yi Shi Fall 2015 Xi an Jiaotong University

CPU Scheduling Yi Shi Fall 2015 Xi an Jiaotong University CPU Scheduling Yi Shi Fall 2015 Xi an Jiaotong University Goals for Today CPU Schedulers Scheduling Algorithms Algorithm Evaluation Metrics Algorithm details Thread Scheduling Multiple-Processor Scheduling

More information

Technician High Pressure Pump Guide for the 7.3 Power Stroke Engine

Technician High Pressure Pump Guide for the 7.3 Power Stroke Engine Technician High Pressure Pump Guide for the 7.3 Power Stroke Engine HIGH PRESSURE PUMP. PUMP LEAKS. ICP SYSTEM DIAGNOSTICS. REPAIR PARTS. TOOLS IPR TEST TOOLS AND ICP PUMP LEAK REPAIR High pressure pumps

More information

Java SE 7 Programming

Java SE 7 Programming Oracle University Contact Us: 1.800.529.0165 Java SE 7 Programming Duration: 5 Days What you will learn This Java SE 7 Programming training explores the core Application Programming Interfaces (API) you'll

More information

Java SE 7 Programming

Java SE 7 Programming Oracle University Contact Us: Local: 1800 103 4775 Intl: +91 80 4108 4709 Java SE 7 Programming Duration: 5 Days What you will learn This Java Programming training covers the core Application Programming

More information

Artificial Intelligence Beating Human Opponents in Poker

Artificial Intelligence Beating Human Opponents in Poker Artificial Intelligence Beating Human Opponents in Poker Stephen Bozak University of Rochester Independent Research Project May 8, 26 Abstract In the popular Poker game, Texas Hold Em, there are never

More information

Load balancing in SOAJA (Service Oriented Java Adaptive Applications)

Load balancing in SOAJA (Service Oriented Java Adaptive Applications) Load balancing in SOAJA (Service Oriented Java Adaptive Applications) Richard Olejnik Université des Sciences et Technologies de Lille Laboratoire d Informatique Fondamentale de Lille (LIFL UMR CNRS 8022)

More information

CS Standards Crosswalk: CSTA K-12 Computer Science Standards and Oracle Java Programming (2014)

CS Standards Crosswalk: CSTA K-12 Computer Science Standards and Oracle Java Programming (2014) CS Standards Crosswalk: CSTA K-12 Computer Science Standards and Oracle Java Programming (2014) CSTA Website Oracle Website Oracle Contact http://csta.acm.org/curriculum/sub/k12standards.html https://academy.oracle.com/oa-web-introcs-curriculum.html

More information

COMPLEX EMBEDDED SYSTEMS

COMPLEX EMBEDDED SYSTEMS COMPLEX EMBEDDED SYSTEMS Real-Time Scheduling Summer Semester 2012 System and Software Engineering Prof. Dr.-Ing. Armin Zimmermann Contents Introduction Scheduling in Interactive Systems Real-Time Scheduling

More information

Threads (Ch.4) ! Many software packages are multi-threaded. ! A thread is sometimes called a lightweight process

Threads (Ch.4) ! Many software packages are multi-threaded. ! A thread is sometimes called a lightweight process Threads (Ch.4)! Many software packages are multi-threaded l Web browser: one thread display images, another thread retrieves data from the network l Word processor: threads for displaying graphics, reading

More information

Announcements. Basic Concepts. Histogram of Typical CPU- Burst Times. Dispatcher. CPU Scheduler. Burst Cycle. Reading

Announcements. Basic Concepts. Histogram of Typical CPU- Burst Times. Dispatcher. CPU Scheduler. Burst Cycle. Reading Announcements Reading Chapter 5 Chapter 7 (Monday or Wednesday) Basic Concepts CPU I/O burst cycle Process execution consists of a cycle of CPU execution and I/O wait. CPU burst distribution What are the

More information

Dynamic Load Balancing. Using Work-Stealing 35.1 INTRODUCTION CHAPTER. Daniel Cederman and Philippas Tsigas

Dynamic Load Balancing. Using Work-Stealing 35.1 INTRODUCTION CHAPTER. Daniel Cederman and Philippas Tsigas CHAPTER Dynamic Load Balancing 35 Using Work-Stealing Daniel Cederman and Philippas Tsigas In this chapter, we present a methodology for efficient load balancing of computational problems that can be easily

More information

Tenacity and rupture degree of permutation graphs of complete bipartite graphs

Tenacity and rupture degree of permutation graphs of complete bipartite graphs Tenacity and rupture degree of permutation graphs of complete bipartite graphs Fengwei Li, Qingfang Ye and Xueliang Li Department of mathematics, Shaoxing University, Shaoxing Zhejiang 312000, P.R. China

More information

NUMERICAL CALCULATION OF THE DENSITY OF PRIME NUMBERS WITH A GIVEN LEAST PRIMITIVE ROOT

NUMERICAL CALCULATION OF THE DENSITY OF PRIME NUMBERS WITH A GIVEN LEAST PRIMITIVE ROOT MATHEMATICS OF COMPUTATION Volume 71, Number 240, Pages 1781 1797 S 0025-5718(01)01382-5 Article electronically published on November 28, 2001 NUMERICAL CALCULATION OF THE DENSITY OF PRIME NUMBERS WITH

More information

Impact of Cloud Computing on Healthcare

Impact of Cloud Computing on Healthcare Volume 1, Issue 3, October 2013 ISSN: 2320-9984 (Online) International Journal of Modern Engineering & Management Research Impact of Cloud Computing on Healthcare Yogesh Khullar Department of Computer

More information

Analysis and Comparison of CPU Scheduling Algorithms

Analysis and Comparison of CPU Scheduling Algorithms Analysis and Comparison of CPU Scheduling Algorithms Pushpraj Singh 1, Vinod Singh 2, Anjani Pandey 3 1,2,3 Assistant Professor, VITS Engineering College Satna (MP), India Abstract Scheduling is a fundamental

More information

Bachelor of Engineering (B.Eng.)

Bachelor of Engineering (B.Eng.) General description of degree program Name of degree program: Degree awarded: Specialization: Educational and professional goals: Program duration: Pre-study work experience: Conditions of admission: Traineeship

More information

Algorithm Design and Analysis

Algorithm Design and Analysis Algorithm Design and Analysis LECTURE 27 Approximation Algorithms Load Balancing Weighted Vertex Cover Reminder: Fill out SRTEs online Don t forget to click submit Sofya Raskhodnikova 12/6/2011 S. Raskhodnikova;

More information

MDS UK Patient Support Group Feedback: London Local AA/MDS Support Group Meeting 19/07/12

MDS UK Patient Support Group Feedback: London Local AA/MDS Support Group Meeting 19/07/12 UK Patient Support Group Feedback: London Local AA/ Support Group Meeting 9/07/ Attendees How did you hear about this regional meeting? Attended* Feedback AA 7 (33%) 3 (67%) 7 5 *Attendance figures depend

More information

GREATEST COMMON DIVISOR

GREATEST COMMON DIVISOR DEFINITION: GREATEST COMMON DIVISOR The greatest common divisor (gcd) of a and b, denoted by (a, b), is the largest common divisor of integers a and b. THEOREM: If a and b are nonzero integers, then their

More information

Thread level parallelism

Thread level parallelism Thread level parallelism ILP is used in straight line code or loops Cache miss (off-chip cache and main memory) is unlikely to be hidden using ILP. Thread level parallelism is used instead. Thread: process

More information

Notes on Network Security - Introduction

Notes on Network Security - Introduction Notes on Network Security - Introduction Security comes in all shapes and sizes, ranging from problems with software on a computer, to the integrity of messages and emails being sent on the Internet. Network

More information

DIPLOMADO DE JAVA - OCA

DIPLOMADO DE JAVA - OCA DIPLOMADO DE JAVA - OCA TABLA DE CONTENIDO INTRODUCCION... 3 ESTRUCTURA DEL DIPLOMADO... 4 Nivel I:... 4 Fundamentals of the Java Programming Language Java SE 7... 4 Introducing the Java Technology...

More information

Setting up your own Internet Radio Station

Setting up your own Internet Radio Station Setting up your own Internet Radio Station Author: Streamit B.V. Version: 1.0.0 Date: March 2007 Copyright Agreement Notitions about trademarks LUKAS IS A REGISTERED TRADEMARK OF STREAMIT, SEE WWW.STREAMIT.EU.

More information

Chapter 5: Process Scheduling

Chapter 5: Process Scheduling Chapter 5: Process Scheduling Chapter 5: Process Scheduling 5.1 Basic Concepts 5.2 Scheduling Criteria 5.3 Scheduling Algorithms 5.3.1 First-Come, First-Served Scheduling 5.3.2 Shortest-Job-First Scheduling

More information

Reviewing All Applications & Critiques for a Review Meeting

Reviewing All Applications & Critiques for a Review Meeting proposalcentral Reviewing All Applications & Critiques for a Review Meeting If you need assistance, contact Customer Service by email at pcsupport@altum.com or by phone at 1-800-875-2562 or phone 703-964-5840

More information

Multiprocessor Graphic Rendering Kerey Howard

Multiprocessor Graphic Rendering Kerey Howard Multiprocessor Graphic Rendering Kerey Howard EEL 6897 Lecture Outline Real time Rendering Introduction Graphics API Pipeline Multiprocessing Parallel Processing Threading OpenGL with Java 2 Real time

More information

Common Approaches to Real-Time Scheduling

Common Approaches to Real-Time Scheduling Common Approaches to Real-Time Scheduling Clock-driven time-driven schedulers Priority-driven schedulers Examples of priority driven schedulers Effective timing constraints The Earliest-Deadline-First

More information

Process Scheduling CS 241. February 24, 2012. Copyright University of Illinois CS 241 Staff

Process Scheduling CS 241. February 24, 2012. Copyright University of Illinois CS 241 Staff Process Scheduling CS 241 February 24, 2012 Copyright University of Illinois CS 241 Staff 1 Announcements Mid-semester feedback survey (linked off web page) MP4 due Friday (not Tuesday) Midterm Next Tuesday,

More information

FAQ: BroadLink Multi-homing Load Balancers

FAQ: BroadLink Multi-homing Load Balancers FAQ: BroadLink Multi-homing Load Balancers BroadLink Overview Outbound Traffic Inbound Traffic Bandwidth Management Persistent Routing High Availability BroadLink Overview 1. What is BroadLink? BroadLink

More information

Communication Cost in Big Data Processing

Communication Cost in Big Data Processing Communication Cost in Big Data Processing Dan Suciu University of Washington Joint work with Paul Beame and Paris Koutris and the Database Group at the UW 1 Queries on Big Data Big Data processing on distributed

More information

REAL TIME OPERATING SYSTEMS. Lesson-18:

REAL TIME OPERATING SYSTEMS. Lesson-18: REAL TIME OPERATING SYSTEMS Lesson-18: Round Robin Time Slicing of tasks of equal priorities 1 1. Common scheduling models 2 Common scheduling models Cooperative Scheduling of ready tasks in a circular

More information

Breaking The Code. Ryan Lowe. Ryan Lowe is currently a Ball State senior with a double major in Computer Science and Mathematics and

Breaking The Code. Ryan Lowe. Ryan Lowe is currently a Ball State senior with a double major in Computer Science and Mathematics and Breaking The Code Ryan Lowe Ryan Lowe is currently a Ball State senior with a double major in Computer Science and Mathematics and a minor in Applied Physics. As a sophomore, he took an independent study

More information

Managing Stop-Go Capital Flows in EM Asia: So Far, So Good

Managing Stop-Go Capital Flows in EM Asia: So Far, So Good Managing Stop-Go Capital Flows in EM Asia: So Far, So Good Andrew Filardo Bank for International Settlements Prepared for the Central Banks of Finland and Austria Joint Conference on European Economic

More information

Lecture Outline Overview of real-time scheduling algorithms Outline relative strengths, weaknesses

Lecture Outline Overview of real-time scheduling algorithms Outline relative strengths, weaknesses Overview of Real-Time Scheduling Embedded Real-Time Software Lecture 3 Lecture Outline Overview of real-time scheduling algorithms Clock-driven Weighted round-robin Priority-driven Dynamic vs. static Deadline

More information

IMPROVEMENT OF DIGITAL IMAGE RESOLUTION BY OVERSAMPLING

IMPROVEMENT OF DIGITAL IMAGE RESOLUTION BY OVERSAMPLING ABSTRACT: IMPROVEMENT OF DIGITAL IMAGE RESOLUTION BY OVERSAMPLING Hakan Wiman Department of Photogrammetry, Royal Institute of Technology S - 100 44 Stockholm, Sweden (e-mail hakanw@fmi.kth.se) ISPRS Commission

More information

Round Robin (RR) ACSC 271 Operating Systems. RR Example. RR Scheduling. Lecture 9: Scheduling Algorithms

Round Robin (RR) ACSC 271 Operating Systems. RR Example. RR Scheduling. Lecture 9: Scheduling Algorithms Round Robin (RR) ACSC 271 Operating Systems Lecture 9: Scheduling Algorithms Each process gets a small unit of CPU time (time quantum), usually 10-100 milliseconds. After this time has elapsed, the process

More information

Contributions to Gang Scheduling

Contributions to Gang Scheduling CHAPTER 7 Contributions to Gang Scheduling In this Chapter, we present two techniques to improve Gang Scheduling policies by adopting the ideas of this Thesis. The first one, Performance- Driven Gang Scheduling,

More information

Using a Digital Recorder with Dragon NaturallySpeaking

Using a Digital Recorder with Dragon NaturallySpeaking Using a Digital Recorder with Dragon NaturallySpeaking For those desiring to record dictation on the go and later have it transcribed by Dragon, the use of a portable digital dictating device is a perfect

More information

Scheduling Parallel Jobs with Monotone Speedup 1

Scheduling Parallel Jobs with Monotone Speedup 1 Scheduling Parallel Jobs with Monotone Speedup 1 Alexander Grigoriev, Marc Uetz Maastricht University, Quantitative Economics, P.O.Box 616, 6200 MD Maastricht, The Netherlands, {a.grigoriev@ke.unimaas.nl,

More information

Applying Fixed Route Principles To Improve Paratransit Runcutting. Keith Forstall

Applying Fixed Route Principles To Improve Paratransit Runcutting. Keith Forstall Applying Fixed Route Principles To Improve Paratransit Runcutting Keith Forstall Why is runcutting important? Scheduling algorithms are designed to schedule trips efficiently They depend on vehicle capacity

More information

Kerrighed: use cases. Cyril Brulebois. Kerrighed. Kerlabs

Kerrighed: use cases. Cyril Brulebois. Kerrighed. Kerlabs Kerrighed: use cases Cyril Brulebois cyril.brulebois@kerlabs.com Kerrighed http://www.kerrighed.org/ Kerlabs http://www.kerlabs.com/ 1 / 23 Introducing Kerrighed What s Kerrighed? Single-System Image (SSI)

More information

Shanghai R&D Vacancies August 2014 PV, PE, Intern

Shanghai R&D Vacancies August 2014 PV, PE, Intern RD Shanghai R&D Vacancies August 2014 PV, PE, Intern 1. Lead Software Engineer- Routing (Req#: 9528) Responsible for development and maintenance of signal routing in EDI platform (NanoRoute). Implementation

More information

Comparing PWM & MPPT Charge Controllers

Comparing PWM & MPPT Charge Controllers Comparing PWM & MPPT Charge Controllers Contents 1.0 Differences between PWM & MPPT...2 1.1 PWM Charge Controllers.3 1.2 MPPT Charge Controllers 4 2.0 Pros and Cons of Both Types of Charge Controllers..5

More information

Guided Performance Analysis with the NVIDIA Visual Profiler

Guided Performance Analysis with the NVIDIA Visual Profiler Guided Performance Analysis with the NVIDIA Visual Profiler Identifying Performance Opportunities NVIDIA Nsight Eclipse Edition (nsight) NVIDIA Visual Profiler (nvvp) nvprof command-line profiler Guided

More information

CPU Scheduling Outline

CPU Scheduling Outline CPU Scheduling Outline What is scheduling in the OS? What are common scheduling criteria? How to evaluate scheduling algorithms? What are common scheduling algorithms? How is thread scheduling different

More information

Online Supplement for Maximizing throughput in zero-buffer tandem lines with dedicated and flexible servers by Mohammad H. Yarmand and Douglas G.

Online Supplement for Maximizing throughput in zero-buffer tandem lines with dedicated and flexible servers by Mohammad H. Yarmand and Douglas G. Online Supplement for Maximizing throughput in zero-buffer tandem lines with dedicated and flexible servers by Mohammad H Yarmand and Douglas G Down Appendix A Lemma 1 - the remaining cases In this appendix,

More information

Schedulers. Operating System. CPU-IO Burst Cycle. Preemptive Scheduling. Scheduling Criteria. Question

Schedulers. Operating System. CPU-IO Burst Cycle. Preemptive Scheduling. Scheduling Criteria. Question Schedulers Operating System Scheduling (h.5) Short-Term Which process gets the PU? Fast, since once per 00 ms Long-Term (batch) Which process gets the Ready? Medium-Term (Unix) Which Ready process to memory?

More information

Portfolio Replication Variable Annuity Case Study. Curt Burmeister Senior Director Algorithmics

Portfolio Replication Variable Annuity Case Study. Curt Burmeister Senior Director Algorithmics Portfolio Replication Variable Annuity Case Study Curt Burmeister Senior Director Algorithmics What is Portfolio Replication? To find a portfolio of assets whose value is equal to the value of a liability

More information

P900 SERIES PORTABLE HYDROSTATIC TESTER

P900 SERIES PORTABLE HYDROSTATIC TESTER P900 SERIES PORTABLE HYDROSTATIC TESTER MODELMODEL P900-SM ( STANDARD MODEL) SHOWN IN PHOTO coated square tubular frame on wheels for easy transport. PANEL MOUNTED CONTROLS: 4 dial gauges, pump regulator

More information

Lecture 13: The Knapsack Problem

Lecture 13: The Knapsack Problem Lecture 13: The Knapsack Problem Outline of this Lecture Introduction of the 0-1 Knapsack Problem. A dynamic programming solution to this problem. 1 0-1 Knapsack Problem Informal Description: We have computed

More information

Systems of Linear Equations

Systems of Linear Equations Systems of Linear Equations Beifang Chen Systems of linear equations Linear systems A linear equation in variables x, x,, x n is an equation of the form a x + a x + + a n x n = b, where a, a,, a n and

More information

Luby s Alg. for Maximal Independent Sets using Pairwise Independence

Luby s Alg. for Maximal Independent Sets using Pairwise Independence Lecture Notes for Randomzed Algorthms Luby s Alg. for Maxmal Independent Sets usng Parwse Independence Last Updated by Erc Vgoda on February, 006 8. Maxmal Independent Sets For a graph G = (V, E), an ndependent

More information

ONLINE CPD FOR SOCIAL SERVICES BUYING YOUR COURSE/S

ONLINE CPD FOR SOCIAL SERVICES BUYING YOUR COURSE/S RSA Tel 086 1000 381 International Tel +27 21 975 2602 Email info.social@ecpd.co.za ONLINE CPD FOR SOCIAL SERVICES BUYING YOUR COURSE/S Page 1 RSA Tel 086 1000 381 International Tel +27 21 975 2602 Email

More information

Welcome to the course Algorithm Design

Welcome to the course Algorithm Design HEINZ NIXDORF INSTITUTE University of Paderborn Algorithms and Complexity Welcome to the course Algorithm Design Summer Term 2011 Friedhelm Meyer auf der Heide Lecture 6, 20.5.2011 Friedhelm Meyer auf

More information

Decentralized Utility-based Sensor Network Design

Decentralized Utility-based Sensor Network Design Decentralized Utility-based Sensor Network Design Narayanan Sadagopan and Bhaskar Krishnamachari University of Southern California, Los Angeles, CA 90089-0781, USA narayans@cs.usc.edu, bkrishna@usc.edu

More information

Chapter 8: Bags and Sets

Chapter 8: Bags and Sets Chapter 8: Bags and Sets In the stack and the queue abstractions, the order that elements are placed into the container is important, because the order elements are removed is related to the order in which

More information

120 Inch Telescope Dome Crane Load Testing

120 Inch Telescope Dome Crane Load Testing State Required Crane Inspection Annual Crane Inspection An annual inspection of our crane is required by law as indicated below. Division 1. Department of Industrial Relations Chapter 4. Division of Industrial

More information

OpenFlow Based Load Balancing

OpenFlow Based Load Balancing OpenFlow Based Load Balancing Hardeep Uppal and Dane Brandon University of Washington CSE561: Networking Project Report Abstract: In today s high-traffic internet, it is often desirable to have multiple

More information

CPU Scheduling. Basic Concepts. Basic Concepts (2) Basic Concepts Scheduling Criteria Scheduling Algorithms Batch systems Interactive systems

CPU Scheduling. Basic Concepts. Basic Concepts (2) Basic Concepts Scheduling Criteria Scheduling Algorithms Batch systems Interactive systems Basic Concepts Scheduling Criteria Scheduling Algorithms Batch systems Interactive systems Based on original slides by Silberschatz, Galvin and Gagne 1 Basic Concepts CPU I/O Burst Cycle Process execution

More information

Certificate IV in Human Resources. Name Other. Tutorial Support if required - Bundaberg Information Version Control

Certificate IV in Human Resources. Name Other. Tutorial Support if required - Bundaberg Information Version Control Location Start Date: Start Date: Start Date: Start Date: Thursday, 14 July 2016 Start Date: End Date: End Date: End Date: End Date: Thursday, 15 September 2016 End Date: Start Time: Start Time: Start Time:

More information

70-563 (VB) - Pro: Designing and Developing Windows Applications Using the Microsoft.NET Framework 3.5

70-563 (VB) - Pro: Designing and Developing Windows Applications Using the Microsoft.NET Framework 3.5 70-563 (VB) - Pro: Designing and Developing Windows Applications Using the Microsoft.NET Framework 3.5 Course Introduction Course Introduction Chapter 01 - Windows Forms and Controls Windows Forms Demo

More information

2. is the number of processes that are completed per time unit. A) CPU utilization B) Response time C) Turnaround time D) Throughput

2. is the number of processes that are completed per time unit. A) CPU utilization B) Response time C) Turnaround time D) Throughput Import Settings: Base Settings: Brownstone Default Highest Answer Letter: D Multiple Keywords in Same Paragraph: No Chapter: Chapter 5 Multiple Choice 1. Which of the following is true of cooperative scheduling?

More information

LOAD BALANCING AND ADMISSION CONTROL OF A PARLAY X APPLICATION SERVER

LOAD BALANCING AND ADMISSION CONTROL OF A PARLAY X APPLICATION SERVER This is an author produced version of a paper presented at the 17th Nordic Teletraffic Seminar (NTS 17), Fornebu, Norway, 25-27 August, 2004. This paper may not include the final publisher proof-corrections

More information

Finding the Measure of Segments Examples

Finding the Measure of Segments Examples Finding the Measure of Segments Examples 1. In geometry, the distance between two points is used to define the measure of a segment. Segments can be defined by using the idea of betweenness. In the figure

More information