|
|
|
- Brian Harper
- 9 years ago
- Views:
Transcription
1 TheEectofNetworkTotalOrder,Broadcast,andRemote-Write CapabilityonNetwork-BasedSharedMemoryComputing RobertStets,SandhyaDwarkadas,LeonidasKontothanassisy,MichaelL.Scott DepartmentofComputerScienceyCompaqCambridgeResearchLab Rochester,NY14627{0226 UniversityofRochester DRAFTCOPY{Pleasedonotredistribute. OneKendallSq.,Bldg.700 Cambridge,MA02139 cationoverhead.suchfeaturesincludereducedlatency,protectedremotememoryaccess,cheapbroadcasting, andorderingguaranteesfornetworkpackets.someofthesefeaturescomeattheexpenseofscalabilityforthe Emergingsystem-areanetworksprovideavarietyoffeaturesthatcandramaticallyreducenetworkcommuni- Abstract networkfabricoratasignicantimplementationcost.inthispaperweevaluatetheimpactofthesefeatures ontheimplementationofsoftwaredistributedsharedmemory(sdsm)systems,inparticular,onthecashmere writeaccesstoremotememory,inexpensivebroadcast,andtotalorderingofnetworkpackets.ourevaluation protocol.cashmerehasbeenimplementedonforthecompaqmemorychannelnetwork,whichhassupportfor frameworkdividessdsmprotocolcommunicationintothreeareas:shareddatapropagation,protocolmeta-data maintenance,andsynchronization;demonstratestheperformanceimpactofexploitingmemorychannelfeatures ineachofthesethreeareas. Cashmere,whichmaximizesitsleverageoftheMemoryChannelfeatures,thanonacomparableversionthatuses explicitmessages(andnobroadcast)forallprotocolcommunication.theperformancedierenceis37%inone threeapplicationsshownoperformancedierences.ingeneral,thedierencesareduetoreducedprotocol-induced application,whichdynamicallydistributesitswork,and11%orlessintheothersevenapplications.theremaining Wefoundthateightofelevenwell-knownbenchmarkapplicationsperformbetteronthebaseversionof showthatthisoptimizationrecoupstheperformancelostbyabandoningtheuseofspecialmemorychannel features.infact,theoptimizationissoeectivethatthreeoftheapplicationsperform18%to34%betterona applicationperturbationandmoreecientmeta-datamaintenance.reducedperturbationaccountsforthelarge 37%improvementbydecreasinginterferencewiththeapplication'sdynamicworkdistributionmechanism. protocolwithmigrationandexplicitmessagesthanonourbaseprotocolthatfullyleveragesthememorychannel. Themessage-basedprotocolhastheadditionaladvantageofallowingsharedmemorytogrowbeyondtheamount Inaddition,wehavealsoinvestigatedHomenodemigrationtoreduceshareddatapropagation.Ourresults researchgrantfromcompaq. thatcanbemappedthroughthenetworkinterface. ThisworkwassupportedinpartbyNSFgrantsCDA{ ,CCR{ ,andCCR{ ;andanexternal 1
2 1Introduction therelativelyhighcostofinter-processorcommunication.recenttrends,particularlytheintroductions parabletospecial-purposeparallelmachines.inpractice,however,performancehasbeenlimitedby Clustersofworkstationsconnectedbycommoditynetworkshavelongprovidedaggregatepowercomsaginglatency,manySANsprovideotheroverhead-reducingfeaturessuchasremotememoryaccess, areanetworks(sans),haveimprovedthepotentialperformanceofclusters.inadditiontolowmes- ofcommodity-pricedsymmetricmultiprocessors(smps)andlow-latency(inthemicroseconds)system inexpensivebroadcast,andtotalorderinginthenetwork[10,15,16].onsmpclustersconnectedby SANs,communicationoverheadcanbegreatlyreduced.Communicationwithinthesamenodecan performancenetwork. occurthroughhardware,whileacrosssmps,communicationoverheadcanbeamelioratedbythehigh grammingparadigmfortheseclustersissoftwaredistributedsharedmemory(sdsm)sinceitutilizes thehardwarewithinanodeeciently.severalstudieshavealreadydeterminedthepositiveimpactof SincesharedmemoryisavailableinhardwarewithinSMPnodes,perhapsthemostnaturalpro- SMP-basedclustersonSDSMperformance[12,14,,21,22,25].Manyofthesesamestudiesutilized lowlatencynetworks.however,thebenetsofadvancednetworkfeatures(forexample,remotememory access)havenotbeendirectlyquantied. state-of-the-artcashmere-2l[25]protocol.thecashmereprotocolusesthevirtualmemorysubsystem totrackdataaccesses,allowsmultipleconcurrentwriters,employshomenodes(i.e.maintainsone Inthispaper,weexaminetheimpactofadvancednetworkingfeaturesontheperformanceofthe mastercopyofeachshareddatapage),andleveragessharedmemorywithinsmpstoreduceprotocol overhead.inpractice,cashmere-2lhasbeenshowntohaveverygoodperformance[12,17,25]. paqmemorychannelnetwork,whichoerslowmessaginglatencies,writeaccesstoremotememory, inexpensivebroadcast,andtotalordering.cashmerethereforeattemptedtomaximizeperformanceby CashmerewasoriginallydesignedforaclusterconsistingofAlphaServerSMPsconnectedbyaCom- placingshareddatadirectlyinremotelyaccessiblememory,usingbroadcasttoreplicatethedirectory amongthenodes,andrelyingonnetworktotalorderandreliabilitytoavoidacknowledgingthereceipt ofmeta-datainformation. 2
3 WehavestructuredourevaluationtodeterminenotonlytheoverallimpactofthespecialMemory Channelfeatures,butalsotheirimpactonprotocolcommunicationandrelateddesign.Ingeneral,an Thepurposeofthispaperistoevaluatetheperformanceimpactofeachofthesedesigndecisions. impactofnetworksupportintheseterms,wehaveconstructedsixcashmerevariants.fourofthe ofinternalprotocoldatastructures(calledprotocolmeta-data),andsynchronization.toevaluatethe SDSMprotocolincurscommunicationinthreeareas:thepropagationofshareddata,themaintenance variantsareusedtoisolatetheimpactofthememorychannelfeaturesonprotocolcommunication migratetoactivewriters,therebyreducingremotepropagationofshareddata.thisoptimizationisonly possiblewhenshareddataisnotinremotelyaccessiblememory,sincemigrationofremotelyaccessible intheaboveareas.thenaltwovariantsemployaprotocoloptimizationthatallowshomenodesto elevenstandardbenchmarks.thelargestimprovementis37%andoccursinaprogramwithdynamic memoryisanexpensiveoperationinvolvingsynchronizationofallthenodesmappingthedata. workdistribution.thisapplicationbenetsfromreducedprotocol-inducedoverheadimbalances,resultinginmoreeectiveworkdistribution.importantly,theuseofthememorychannelfeaturesnever degradesperformanceinanyoftheapplications.intermsofprotocoldesign,meta-datamaintenance recovermostofthebenetslostbynotusingremotewriteaccesstopropagateshareddata,andimpor- benetsthemostfromthenetworksupport.inaddition,wefoundthathomenodemigrationcan OurresultsshowthattheMemoryChannelfeaturesimproveperformancebyanaverageof8%across space.1threeoftheapplicationsactuallyobtaintheirbestperformance(byafactorof18-34%)ona protocolwithmigrationandexplicitmessages. tantly,allowsshareddatasizetoscalebeyondtheournetwork'slimitedremotely-accessiblememory protocol.section3evaluatestheimpactofthememorychannelfeaturesandthehomenodemigration optimization.section4coversrelatedwork,andsection5outlinesourconclusions. ThenextsectiondiscussestheMemoryChannelanditsspecialfeatures,alongwiththeCashmere inatethisrestrictionareafocusofongoingresearch[6,26]. 1Mostcurrentcommodityremoteaccessnetworkshavealimitedremotely-accessiblememoryspace.Methodstoelim- 3
4 2ProtocolVariantsandImplementation systemshasquantiedthebenetsofsmpnodestosdsmperformance.inthispaper,wewillexamine Compaq'sMemoryChannelnetwork[15].Earlierwork[12,14,,21,22,25]onCashmereandother CashmerewasdesignedforSMPclustersconnectedbyahighperformancenetwork,specically, theperformanceimpactofthenetworkfeaturesexploited. FollowingthisoverviewisadescriptionoftheCashmereprotocol.Inkeepingwiththefocusofthis paper,thedesigndiscussionwillprimarilyfocusontheaspectsofnetworkcommunication.adiscussion WebeginbyprovidinganoverviewoftheMemoryChannelnetworkanditsprogramminginterface. 2.1MemoryChannel ofthedesigndecisionsrelatedtothesmpnodescanbefoundinanearlierpaper[25]. bility,whichallowsprocessorstomodifyremotememorywithoutremoteprocessorintervention.the MemoryChannelusesamemory-mapped,programmedI/Ointerface.Toworkwithremotely-accessible TheMemoryChannelisareliable,low-latencynetwork.Thehardwareprovidesremote-writecapa- memory,aprocessormustattachtoregionsinthememorychannel'saddressspace.theregionscan inparticular,toaddressesonthememorychannel'snetworkadapter.i/ospaceisuncacheable,but bemappedforeithertransmitorreceive.thephysicaladdressesoftransmitregionsmaptoi/ospace, writescanbecoalescedintheprocessor'swritebuer.receiveregionsmapdirectlytophysicalmemory. regionsareroutedtothenetworkadapter(onthepcibus),whichautomaticallyconstructsandlaunches adatamessage.uponmessagereception,anode'sadapterperformsadmaaccesstomainmemoryif Afterinitialconnectionsetup,thenetworkcanbeaccesseddirectlyfromuserlevel.Writestotransmit theregionismappedforreceive.otherwise,themessageisdropped. itselfintheoutgoingmessage'sdestination.themessagewillmovethroughthehubandarrivebackat node.byplacingaregioninloopbackmode,however,wecanarrangeforthesourceadaptertoinclude Normally,awritetoatransmitregionisnotreectedtoacorrespondingreceiveregiononthesource thesource,whereitwillbeprocessedasanormalincomingmessage.theadapterwillplacethedata intheappropriatereceiveregion. TheMemoryChannelguaranteestotalorder{allwritestothenetworkareobservedinthesame 4
5 orderbyallreceivers.thisguaranteeisprovidedbyaserializinghubthatconnectsallthemachines inthecluster.thehubisbus-based,whichensuresserializationandalsoaccountsforthenetwork's inexpensivebroadcastsupport.(thesecondgenerationofthememorychannelhasacrossbarhub. 2.2ProtocolOverview Broadcastsupportwillstillavailable,howeveratahighercost.) theunitofcoherenceisavirtualmemorypage(8konoursystem).tousecashmere,anapplication mustbedata-race-free[1].simplystated,oneprocessmustsynchronizewithanotherinordertosee Cashmereusesthevirtualmemory(VM)subsystemtotrackaccessestoshareddata,andsonaturally, section;thelatterisusedtoexit. itsmodications.also,allsynchronizationprimitivesmustbevisibletothesystem.theseprimitives canbeconstructedfrombasicacquireandreleaseoperations.theformerisusedtoentertoacritical modelsimplementedbymunin[5]andtreadmarks[2].intheformer,modicationsbecomevisibleat visibleataprocessoratthetimeofitsnextacquireoperation.thismodelliesinbetweentheconsistency Cashmereimplementsavariantofdelayedconsistency[9].Inthisvariant,datamodicationsbecome thetimeofthemodier'sreleaseoperation.inthelatter,modicationsbecomevisibleatthetimeof initiallyassignedusingarst-touchpolicy.thehomenodecollectsallmodicationsintoamastercopy thenextcausallyrelatedacquire. ofthepage.sharingsetinformationandhomenodelocationsaremaintainedinadirectorycontaining InCashmere,eachpageofsharedmemoryhasasingle,distinguishedhomenode.Homenodesare oneentryperpage. protocolupdatesthesharingsetinformationinthedirectoryandobtainsanup-to-datecopyofthe pagefromthehomenode.ifthefaultisduetoawriteaccess,theprotocolwillalsocreateapristine Themainprotocolentrypointsarepagefaultsandsynchronizationoperations.Onapagefault,the copy(calledatwin)ofthepageandaddthepagetothedirtylist.asanoptimizationinthewritefault handler,apagethatissharedbyonlyonenodeismovedintoexclusivemode.inthiscase,thetwin anddirtylistoperationareskipped,andthepagewillincurnoprotocoloverheaduntilanothersharer emerges. 5
6 ProtocolName CSM-DMS CSM-MS CSM-S Data MC Meta-dataSynchronizationHomeMigration MC CSM-MS-Mg CSM-None-MgExplicit Explicit MC Explicit MC Yes No Table1:Theseprotocolvariantshavebeenchosentoisolatetheperformanceimpactofspecialnetwork featuresontheareasofsdsmcommunication.useofspecialmemorychannelfeaturesisdenoted bya\mc"undertheareaofcommunication.otherwise,theexplicitmessagesareused.theuseof MemoryChannelfeaturesisalsodenotedintheprotocolsux(D,M,and/orS),asistheuseofhome nodemigration(mg). pagetoitstwininordertouncoverthemodications.thesemodicationsareplacedinadimessage andsenttothehomenodetobeincorporatedintothemastercopyofthepage.uponcompletionof Atthenextreleaseoperation,theprotocolexamineseachpageinthedirtylistandcomparesthe thedimessage,theprotocoldowngradespermissionsonthedirtypagesandsendswritenoticesto allnodesinthesharingset.thesewritenoticesareaccumulatedintoalistatthedestinationand processedatthenode'snextacquireoperation.allpagesnamedbywritenoticesareinvalidatedas partoftheacquire. 2.3ProtocolCommunication thespecialmemorychannelfeaturesonthesethreeareas,wehavepreparedsixvariantsofthecashmere propagation,protocolmeta-datamaintenance,andsynchronization.inordertoisolatetheeectsof Asdescribedearlier,protocolcommunicationcanbebrokendownintothreeareas:shareddata protocol.table1liststhevariantsandcharacterizestheiruseofthememorychannel.foreachofthe betweenprocessors.weassumeareliablenetwork(asiscommonincurrentsans).ifwewishto areasofprotocolcommunication,theprotocolseitherleveragethefullmemorychannelcapabilities establishordering,however,explicitmessagesrequireanacknowledgement. (i.e.remotewriteaccess,totalordering,andinexpensivebroadcast)orinsteadsendexplicitmessages 6
7 2.3.1CSM-DMS:Data,Meta-data,andSynchronizationusingMemoryChannel Thebaseprotocol,denotedCSM-DMS,isthesameCashmere-2Lprotocoldescribedinourstudyon theeectsofsmpclusters[25].asdescribedinthesubsequentparagraphs,thisprotocolfullyexploits thememorychannelforallsdsmcommunication:topropagateshareddata,tomaintainprotocol meta-data,andforsynchronization.thefollowingtextdescribeshowthefeaturesareleveraged. Data:Shareddataisfetchedfromthehomenodeandmodicationsarewrittenback,intheform ofdis,tothehomenode.2thefetchoperationcouldbeoptimizedbyaremotereadoperationor byallowingthehomenodetowritethedatadirectlytotheworkingaddressontherequestingnode. 128MofMemoryChanneladdressspace,thissignicantlylimitsthemaximumdatasetsize.(Foreight requiresshareddatatobemappedatdistinctmemorychanneladdressesoneachnode.withonly Unfortunately,therstoptimizationisnotavailableontheMemoryChannel.Thesecondoptimization nodes,themaximumdatasetwouldbeonlyabout16m.)forthisreason,csm-dmsdoesnotusethe secondoptimizationeither. homenodecopiesonly.thisstilllimitsdatasetsize,butthelimitismuchhigher.withhomenode copiesinmemorychannelspace,aprocessorcanuseremotewritestoapplydisatreleasetime.this InsteadofusingMemoryChanneladdressspaceforallshareddatacopies,CSM-DMSusesitfor usageavoidstheneedtointerruptahomenodeprocessor. onthememorychannel'stotalordering.csm-dmsperformsalldioperationsandthencompletes section.ratherthanrequiringhomenodestoreturndiacknowledgements,csm-dmsinsteadrelies Toavoidraceconditions,Cashmeremustbesurealldisarecompletedbeforeexitingacritical thereleaseoperationbyresettingthecorrespondingsynchronizationlocationinmemorychannelspace. Sincethenetworkistotallyordered,thediisguaranteedtobecompletedbythetimeotherprocessors Meta-data:System-widemeta-datainCSM-DMSconsistsofthepagedirectoryandwritenotices. observethecompletionofthereleaseoperation. usebandwidthmoreecientlythanwrite-through,andtoprovidebetterperformance. CSM-DMSreplicatesthepagedirectoryoneachnodeandthenusesaremotewritetobroadcastall 2AnearlierCashmerestudy[17]investigatedusingwrite-throughtopropagatedatamodications.Diswerefoundto 7
8 changes.cashmerealsousesremotewritestodeliverwritenoticestoawell-knownlocationoneach node.atanacquire,thenodesimplyreadsthewritenoticesfromthatlocation.aswithdis,cashmere takesadvantageoftheguaranteednetworkorderingtoavoidwritenoticeacknowledgements. andbyatest-and-setagoneachnode.aprocessbeginsagloballockacquireoperationbyrst andwriteorderingcapabilities.locksarerepresentedbyan8-entryarrayinmemorychannelspace, Synchronization:Applicationlocks,barriers,andagsallleveragetheMemoryChannel'sbroadcast acquiringthelocaltest-and-setlock.thentheprocessassertsitsnodeentryinthe8-entryarray,waits forthewritetoappearvialoop-back,andthenreadstheentirearray.ifanyoftheotherentriesare set,theprocessresetsitsentry,backso,andtriesagain.ifnootherentriesareset,thelockhasbeen todetermineifitisthelastprocessoronthenodetoenterthebarrier.ifso,theprocessorupdates acquired.barriersarerepresentedbya8-entryarray,a\sense"variableinmemorychannelspace thenode'sentryinthe8-entryarray.asinglemasterprocessorwaitsforallnodestoarriveandthen andalocalcounteroneachnode.aprocessoratomicallyreadsandincrementsthelocalnodecounter togglesthesensevariable.thisreleasesallthenodes,whicharespinningonthesensevariable.flags simplyusethememorychannel'sremotewriteandbroadcast CSM-MS:Meta-dataandSynchronizationusingMemoryChannel andsoavoidsnetwork-inducedlimitationsondatasetsize.thetradeoisthatcsm-mscannotleverage whichlimitsthemaximumdatasetsize.csm-msdoesnotplaceshareddatainmemorychannelspace Asmentionedabove,CSM-DMSplaceshomenodepagecopiesintheMemoryChanneladdressspace, viamemorychannelwritestoasharedag,coupledwithreceive-sidepollingonloopbackedges.)in mustinterruptthehomenodeandwhichrequireexplicitacknowledgements.(interruptsareachieved thememorychanneltooptimizedicommunication.instead,disaresentasexplicitmessages,which 2.3.3CSM-S:SynchronizationusingMemoryChannel CSM-MS,meta-dataandsynchronizationstillleverageallMemoryChannelfeatures. Thethirdprotocolvariant,CSM-S,onlyleveragestheMemoryChannelforsynchronization.Explicit messagesareusedbothtopropagateshareddataandtomaintainmeta-data.insteadofbroadcasting 8
9 adirectorychange,aprocessmustsendthechangetothehomenodeinanexplicitmessage.thehome nodeupdatestheentryandacknowledgestherequest.thehomenodeistheonlynodeguaranteedto haveanup-to-datedirectoryentry. canbepiggybackedontoanexistingmessage.forexample,adirectoryupdateisimplicitinapage fetchrequestandsocanbepiggybacked.also,writenoticesalwaysfollowdioperations,sothehome Inmostcases,anseparatedirectoryupdate(orread)messagecanbeavoided.Instead,theupdate nodecansimplypiggybackthesharingset(neededtoidentifywheretosendwritenotices)ontothedi 2.3.4CSM-None:NoUseofSpecialMemoryChannelFeatures acknowledgment.infact,anexplicitdirectorymessageisneededonlywhenapageisinvalidated.3 Thefourthprotocol,CSM-None,usesexplicitmessages(andacknowledgments)forallcommunication. Thisprotocolvariantreliesonlyonlow-latencymessaging,andsocouldeasilybeportedtoother reliesontheecientpollingmechanismdescribedabove.earliercashmerework[17]foundthatthe low-latencynetworkarchitectures.ratherthaninter-processorinterrupts,ourlow-latencymessaging expensivekerneltransitionincurredbyinter-processorinterruptslimitedthebenetsofthelow-latency network.inourimplementation,wepollawell-knownmessagearrivalagthatisupdatedthrough remote-write.thismechanismshouldbeconsideredindependentofouraboveuseofremotewrite,since ecientpollingcanbeimplementedonothernetworkinterfaces[10,26]thatlacktheabilitytowrite toarbitrary,user-denedlocations. Alloftheaboveprotocolvariantsuserst-touchhomenodeassignment[18].Homeassignmentis 2.3.5CSM-MS-MgandCSM-None-Mg:HomeNodeMigration extremelyimportantbecauseprocessorsonthehomenodewritedirectlytomastercopyandsodonot incurcostlytwinanddioverheads.ifapagehasmultiplewritersduringthecourseofexecution, whendataisremotelyaccessible.hence,csm-ms-mgandcsm-none-mgbothkeepshareddatain protocoloverheadcanpotentiallybereducedbymigratingthehomenodetotheactivewriters. DuetothehighcostofremappingMemoryChanneladdresses,migratinghomenodescannotbeused oftenover-estimatethenumberofsharersandcompromisetheeectivenessofcashmere'sexclusivemodeoptimization. 3Theprotocolcouldbedesignedtolazilydowngradethedirectoryentryinthiscase.Howeverthedirectoryentrywould 9
10 privatememory,andallowthehometomigrateduringexecution.whenaprocessorincursawrite fault,theprotocolchecksthelocalcopyofthedirectorytoseeifthehomeisactivelywritingthepage. Ifnot,amigrationrequestissenttothehome.Therequestisgrantedifreceivedwhenthehomeisnot whilecsm-none-mgusesonlyexplicitmessages.thelatterprotocolcansuerfromunnecessary writingthepage.ifgranted,thehomesimplychangesthedirectoryentrytopointtothenewhome. migrationrequestssincethecacheddirectoryentriesmaybeout-of-date.wedonotpresentcsm-s-mg TheCSM-MS-MgusesMemoryChannelfeaturesformeta-datamaintenanceandforsynchronization, ofwhetherthehomenodeisxedormigrating. sincetheresultsofusingthememorychannelforsynchronizationarequalitativelythesameregardless 3Results Next,wediscusstheresultsofourinvestigationoftheimpactofMemoryChannelfeaturesandthe homenodemigrationoptimization. Webeginthissectionwithabriefdescriptionofourhardwareplatformandourapplicationsuite. 3.1PlatformandBasicOperationCosts phaserverisequippedwithfour21064aprocessorsoperatingat233mhzandwith256mbofshared memory,aswellasamemorychannelnetworkinterface.the21064ahastwoon-chipcaches:a16k OurexperimentalenvironmentconsistsoffourDECAlphaServer21004/233computers.EachAl- bytes.eachalphaserverrunsdigitalunix4.0dwithtruclusterv.1.5(memorychannel)extensions. instructioncacheand16kdatacache.theo-chipsecondarycachesizeis1mbyte.acachelineis64 Thesystemsexecuteinmulti-usermode,butwiththeexceptionofnormalUnixdaemonsnootherprocesseswereactiveduringthetests.Inordertoincreasecacheeciency,applicationprocessesarepinned toaprocessoratstartup.nootherprocessorsareconnectedtothememorychannel.executiontimes One-waylatencyfora64-bitremote-writeoperationis4.3secs.Inpractice,theround-triplatencyfor representthemedianvaluesofthreeruns. nullmessageincashmereis39secs.thistimeincludesthetransferofthemessageheaderandthe Onourplatform,theMemoryChannelhasapoint-to-pointbandwidthofapproximately33MBytes/sec. invocationofanullhandlerfunction. 10
11 Operation Di(secs) LockAcquire(secs) MemoryChannelFeaturesExplicitMessages 290{ {760 Table2:Basicoperationcostsat16-processors.Dicostvariesaccordingtothesizeofthedi. Barrier(secs) Barnes CLU Program 128Kbodies(26Mbytes) 48x48(33Mbytes) ProblemSize Time(sec.) EM3D 2500x2500(50Mbytes) Gauss 64000nodes(52Mbytes) Ilink 48x48(33Mbytes) SOR TSP 3072x4096(50Mbytes) CLP(15Mbytes) Volrend 17cities(1Mbyte) Water-nsquared Head(23Mbytes) Water-spatial 9261mols.(16Mbytes) 9261mols.(6Mbytes) Table3:Datasetsizesandsequentialexecutiontimeofapplications directoryupdates,writenoticepropagation,andsynchronization.table2showsthecostsfordi operations,lockacquires,andbarriers,bothwhenleveragingandnotleveragingthememorychannel Asdescribedearlier,MemoryChannelfeaturescanbeusedtosignicantlyreducethecostofdis, notices,andagsynchronizationallusethememorychannel'sremote-writeandtotalorderingfeatures. (Directoryupdatesandagsynchronizationalsorelyontheinexpensivebroadcastsupport.)Without features.thecostofdioperationsvariesaccordingtothesizeofthedi.directoryupdates,write messageswithsimplehandlers,sotheircostisonlyslightlymorethanthecostofanullmessage.the costofwritenoticeswilldependgreatlyonthewritenoticecountanddestinations.writenotices thesefeatures,theseoperationsareaccomplishedviaexplicitmessages.directoryupdatesaresmall senttodierentdestinationscanbeoverlapped,thusreducingtheoperation'soveralllatency.flagsare inherentlybroadcastoperations,butagaintheagupdatemessagestotheprocessorscanbeoverlapped soperceivedlatencyshouldnotbemuchmorethanthatofanullmessage. 11
12 3.2ApplicationSuite bution. Barnes:anN-bodysimulationfromtheTreadMarks[2]distribution(andbasedonthesameapplication Ourapplicationsarewell-knownbenchmarksthathavenotbeenmodiedfromtheiroriginaldistri- usedtocontrolthecomputation.synchronizationconsistsofbarriersbetweenphases. inthesplash-1[23]suite),usingthehierarchicalbarnes-hutmethod.bodiesinthesimulationspace areplacedintonodesinatreestructurebasedontheirphysicallocations,andthistreestructureis CLU:fromtheSPLASH-2[27]benchmark.Thekernelfactorsamatrixintotheproductofalowertriangularandanupper-triangularmatrix.Workisdistributedbysplittingthematrixintoblocksand assigningeachblocktoaprocessor.blocksmodiedbyasingleprocessorareallocatedcontiguouslyin contiguously,resultinginmultiplewritesharerspercoherenceblock. LU:AlsofromSplash-2.TheimplementationisidenticaltoCLUexceptthatblocksarenotallocated ordertoincreasespatiallocality.barriersareusedforsynchronization. EM3D:aprogramtosimulateelectromagneticwavepropagationthrough3Dobjects[8].Theprimary computationalelementisasetofmagneticandelectricnodesthatareequallydistributedamongthe synchronizedthroughbarriers. processors.thesenodesareonlysharedamongstneighboringprocessors.phasesofthesimulationare Gauss:alocally-developedsolverforasystemoflinearequationsAX=BusingGaussianElimination Ilink:awidelyusedgeneticlinkageanalysisprogramfromtheFASTLINK2.3P[11]packagethatlocates tosignalwhenapivotrowbecomesavailable. andback-substitution.rowsaredistributedamongprocessorscyclically.synchronizationagsareused diseasegenesonchromosomes.amasterprocessorperformsaround-robinassignmentofelementsina sparsearraytoapoolofslaveprocessors.theslavesperformcalculationsontheassignedprobabilities andbetweeniterationsintheprogram. SOR:aRed-BlackSuccessiveOver-RelaxationprogramfromtheTreadMarksdistribution.Theprogramsolvespartialdierentialequations.Theredandblackarraysaredividedintoroughlyequalsize bandsofrows,witheachbandassignedtoadierentprocessor.processorssynchronizeusingbarriers. andreporttheresultstothemaster.barriersareusedtosynchronizebetweenthemasterandslaves 12
13 TSP:abranch-and-boundsolutiontothetravelingsalesmanproblem.Theprogram,alsofrominthe TreadMarksdistribution,distributesworkthroughataskqueue.Itisnon-deterministic,inthatparts ofthesearchspacecanbepruned,dependingonwhenshortpathsarefound.thetaskqueuesare Volrend:aSPLASH-2applicationthatrendersathree-dimensionalvolumeusingaraycastingtech- protectedbylocks. nique.theimageplaneispartitionedamongprocessorsincontiguousblocks,whicharefurtherparti- tionedintosmalltiles.thesetilesserveasthebasicunitofworkandaredistributedthroughasetof Water-nsquared:auidowsimulationfromtheSPLASH-2benchmarksuite.Themoleculestructuresarekeptinasharedarraythatisdividedintocontiguouschunksandassignedtoprocessors. taskqueues.again,thetaskqueuesareprotectedbylocks. Thebulkoftheinterprocessorcommunicationoccursduringaphasethatupdatesintermolecularforces resultinginamigratorysharingpattern. (fromwithinaradiusofn/2molecules,wherenisthenumberofmolecules),usingper-moleculelocks, Water-spatial:anotherSPLASH-2uidowsimulationthatsolvesthesameproblemasWaternsquared.Thesimulationspaceisplacedunderauniform3-Dgridofcells,witheachcellassignedto aprocessor.sharingoccurswhenmoleculesmovefromonecelltoanother.incomparisonwithwaternsquared,thisapplicationalsousesamoreecient,linearalgorithm.theapplicationusesbarriersand Thesizeofsharedmemoryspaceislistedinparentheses.Executiontimesweremeasuredbyrunning lockstosynchronize. eachuninstrumentedapplicationsequentiallywithoutlinkingittotheprotocollibrary. ThedatasetsizesanduniprocessorexecutiontimesfortheseapplicationsarepresentedinTable3. 3.3Performance writecapabilities,inexpensivebroadcast,andtotal-orderingproperties,onthethreetypesofprotocol communication:shareddatapropagation,protocolmeta-datamaintenance,andsynchronization.all ThissubsectionbeginsbydiscussingtheimpactofMemoryChannelsupport,inparticular,remote- dierences.toelminatethesedierencesandisolatememorychannelimpact,wecapturedtherst-touchassignments protocolsdescribedinthissubsectionusearst-touchhomenodeassigment.4wefoundthateightofour 4Inthecaseofmultiplesharersperpage,thetimingdierencesbetweenprotocolvariantscanleadtorst-touch 13
14 distributeswork.inthiscase,thespecialmemorychannelfeaturesservetoreduceprotocol-induced canbeespeciallylarge(upto37%overanexplicitmessagingprotocol)inanapplicationthatdynamically elevenbenchmarkapplicationsbenetedfromthespecialmemorychannelfeatures.theimprovement overhead,therebyreducingloadimbalanceandcostlyworkre-distributions. executiontime,normalizedtothatofthecsm-dmsprotocol,forthesixprotocolsvariants.execution timeisbrokendowntoshowthetimespentexecutingapplicationcode(user),executingprotocol Throughoutthissection,wewillrefertoFigure1andTable4.Figure1showsabreakdownof code(protocol),waitingonsynchronizationoperations(synchronization),andsendingorreceiving messages(message).table4liststhespeedupsandstatisticsonprotocolcommunicationforeach oftheapplicationsrunningon16processors.thestatisticsincludethenumberofpagetransfers, invalidations,anddioperations.thetablealsolistthenumberofhomemigrations,alongwiththe numberofmigrationattempts(listedinparentheses) TheImpactofMemoryChannelFeatures (fullyleveragingmemorychannelfeatures)asopposedtocsm-none(usingexplicitmessages).volrend runs37%fasteroncsm-dmsthanitdoesoncsm-none.barnes,em3d,lu,andwater-nsquared EightofourelevenapplicationsshowmeasurableperformanceimprovementsrunningonCSM-DMS run7-11%faster.gaussandsorrunlessthan4%faster.threeapplications,clu,ilink,andtsp, dierencesacrossourprotocols. arenotsensitivetotheuseofmemorychannelfeaturesanddonotshowanysignicantperformance numberofpagetransfersanddisincreasesasshareddatapropagationandprotocolmeta-datamaintenancenolongerleveragememorychannelfeatures.despiteperformingallprotocolcommunication withexplicitmessages,csm-noneperformsbetterthancsm-s.oncsm-none,theapplicationhas protocolinducesloadimbalanceandtriggersexpensivetaskstealing.ascanbeseenfromtable4,the Volrend'sperformanceisverysensitivetoitsworkloaddistribution.Peturbationintroducedbythe fromcsm-dmsandusedthemtoexplicitlyassignhomenodesintheotherprotocols. betterloadbalanceandincurslesstaskstealing.csm-noneperformsfewerdioperations,andinstrumentationshowsitalsoperformsfeweraccessestothetaskqueuelock.regardess,volrendperforms 14
15 poorlyoverallonallprotocolversions{thebestachievedspeedupisonlytwoon16processors. totalorderinghavethebiggestimpactonthecostofmeta-datamaintenance.thesefeaturespermit overhead.performanceslowlydegradesacrosstheprotocols.inthisapplication,remotewritesand Barnesexhibitsahighdegreeofsharingandincursalargeamountofprotocolandsynchronization anapproximate5%reductioninthelargenumberofinvalidations.withouttheuseofmemorychannelfeatures,theinvalidationsrequireexplicitmessagestoupdatethemasterdirectoryentry.these messagesresultinhigherprotocoloverheadandpoorersynchronizationcharacteristics(seefigure1). oftheotherapplications.thelargenumberoflocksreducesper-lockcontention,andlargelylimitsthe However,acrossprotocols,theapplicationdoesnotshowalargesynchronizationtimerelativetosome Water-nsquaredhasalargenumberofsynchronizationoperationsduetoitsuseofper-moleculelocks. synchronizationoverheadtothesynchronizationmechanismandassociatedprotocoloverhead.this applicationbenetsmostfromusingthememorychannelfeaturestooptimizelocksynchronization. withinthesamesmpnode.however,asprotocol-inducedoverheadincreasesacrosstheprotocols, encouragelockhandosbetweenneighboringprocessors,whichincurnegligibleprotocoloverheadif Figure1showsthatthesynchronizationcostishighestinCSM-None.Theapplicationiswrittento disandhigherprotocolandsynchronizationtimes.asinvolrend,csm-dmsincurstheleastamount morelockhandosoccurbetweenprocessorsondierentnodes.theseinter-nodehandosleadtomore avoidingdioperations. ofperturbation(imbalance)duetotheprotocol,whichhelpskeeplockaccessesinsidenodes,thereby ofthedisandinvalidationoperations.theprotocolsusingexplicitmessagesshowhigherprotocoland sharingatrowboundaries.remotewritesandtotalorderingareveryeectiveatreducingtheoverhead Atthegivenmatrixsize,LUincursalargeamountofprotocolcommunicationduetothewrite-write synchronizationoverhead,duetomoreexpensivedisandinvalidations. Channelsupport.Intheseapplications,ourinstrumentationshowsthatmostdisarehandledbyan idleprocessor.fortheseapplications,meta-datamaintenanceisagaintheareathatbenetsmostfrom EM3D,Gauss,SOR,andWater-spatialallbenetfromprotocolsthatleveragethespecialMemory specialmemorychannelsupport. Oftheremainingapplications,CLU,Ilink,andTSParenotnoticeablyaectedbytheunderlying 15
16 MemoryChannelsupport.CLUandTSPhavelittlecommunicationthatcanbeoptimized.Ilink, however,performsalargenumberofdis,andmightbeexpectedtobenetsignicantlyfromremotewritesupport.however,90%ofthedisareappliedatthehomenodebyidleprocessors,sotheextra overheadissomewhathiddenfromapplicationcomputation HomeNodeMigration:OptimizationforaScalableDataSpace activewriters.ourresultsshowthatthisoptimizationisveryeective.sixofourelevenapplications Homenodemigrationcanreducethenumberofremotememoryaccessesbymovingthehomenodeto performbetterusinghomenodemigrationandexplicitdatapropagation(csm-ms-mg)thanusing overheadbyreducingthenumberoftwin/disandinvalidations.infact,thisreductioncanbesogreat rst-touchandremote-writedatapropagation(csm-dms).5homenodemigrationcanreduceprotocol thatthreeofourapplicationsobtainthebestoverallperformancewhenusingmigrationandexplicit messagesforallprotocolcommunication. attendanttwin)operationsissignicantlyreduced(seetable4).infact,fortheseapplications,csm- None-Mg,whichdoesnotleveragethespecialMemoryChannelfeaturesatall,outperformsthefull Volrend,Water-spatial,andLUallbenetgreatlyfrommigrationbecausethenumberofdi(and componentofexecutiontimeissignicantlydecreasedfortheseapplications.involrend,thisdecrease isespeciallyimportantsincethereducedprotocoloverheadleadstobetterloadbalanceandlesstask MemoryChannelprotocolCSM-DMSbyarangeof18%to34%.Figure1showsthattheprotocol stealing. betterthantheirrst-touchcounterpartsthatuseexplicitmessagesforatleastsomeprotocolcommunication(csm-ms,csm-s,andcsm-none).themigrationoptimizationagainreducesthenumber OnEM3DandWater-nsquared,themigrationprotocolsCSM-MS-MgandCSM-None-Mgperform ofdioperations.however,thisgainisosetbyincreasedoverheadofmigrationrequests.thetwo migrationprotocolsperformbasicallythesameasthefullmcprotocol,csm-dms. ofremapping. BarnesandGaussaretheonlytwoapplicationstosuerunderthemigrationoptimization.In 5Migrationcannotbeusedwhendataisplacedinremotely-accessiblenetworkaddressspace,becauseofthehighcost 16
17 1 Barnes 1 CLU 1 LU Execution Breakdown (%) Execution Breakdown (%) Execution Breakdown (%) CSM-DMS 1 CSM-MS CSM-S CSM-None EM3D CSM-MS-MG CSM-None-MG 0 CSM-DMS 1 CSM-MS CSM-S CSM-None Ilink CSM-MS-MG CSM-None-MG 0 CSM-DMS 1 CSM-MS CSM-S CSM-None Gauss CSM-MS-MG CSM-None-MG Execution Breakdown (%) Execution Breakdown (%) Execution Breakdown (%) CSM-DMS 1 CSM-MS CSM-S CSM-None SOR CSM-MS-MG CSM-None-MG CSM-DMS 1 CSM-MS CSM-S CSM-None TSP CSM-MS-MG CSM-None-MG CSM-DMS 0 CSM-MS CSM-S CSM-None Volrend CSM-MS-MG CSM-None-MG Execution Breakdown (%) Execution Breakdown (%) Execution Breakdown (%) CSM-DMS 1 CSM-MS CSM-S CSM-None Water-NSQ CSM-MS-MG CSM-None-MG CSM-DMS 1 CSM-MS CSM-S CSM-None Water-SP CSM-MS-MG CSM-None-MG CSM-DMS CSM-MS CSM-S CSM-None CSM-MS-MG CSM-None-MG Execution Breakdown (%) 80 Message Synchronization Protocol User ofmemorychannelfeatures).mgdenotesamigratinghomenodepolicy. ThesuxontheprotocolnamerepresentstheareasofcommunicationusingMemoryChannelfeatures (D:sharedDatapropagation,M:protocolMeta-datamaintenance,S:Synchronization,None:Nouse Figure1:Normalizedexecutiontimebreakdownfortheapplicationsontheprotocolsat16processors CSM-DMS CSM-MS CSM-S CSM-None CSM-MS-MG CSM-None-MG Execution Breakdown (%) 80 CSM-DMS CSM-MS CSM-S CSM-None CSM-MS-MG CSM-None-MG
18 Table4:Applicationspeedupsandstatisticsat16processors.Thesuxontheprotocolnamerepresents theareasofcommunicationusingmemorychannelfeatures(d:shareddatapropagation,m:protocol amigratinghomenodepolicy. Meta-datamaintenance,S:Synchronization,None:NouseofMemoryChannelfeatures).Mgdenotes 18
19 Barnes,thedegreeofsharingisveryhighandthereisalargenumberofmigrationrequests.Theextra overheadoftheserequestsbalancesthereductionofdioperationsincsm-ms-mg.csm-none-mg losesperformancesincedirectorystateisnolongerkeptconsistentglobally.asaresult,csm-none-mg sendsapproximately580kunsuccessfulmigrationrequests.asshownintable4,gaussperformsmany overheadwithrespecttotherst-touchprotocols. moreinvalidationswhenusingmigration.theseinvalidationsresultinincreasedprotocolandmessaging thebenetsareosetbyincreasedoverheadduetomigrationcosts. tothemigrationmechanism.inilinkthenumberofdioperationsissignicantlyreduced,butagain CLU,Ilink,andTSPagainarerelativelyinsensitivetotheunderlyingMemoryChannelsupportor 4RelatedWork totalordering.theirresultsshowthatadvancednetworkfeaturesprovidelargeimprovementsinsdsm performance.theirnetworkhasbothremote-writeandremote-readcapabilities,butnobroadcastor Inatechnicalreport,Bilasetal.[4]alsoexaminetheimpactofspecialnetworkfeaturesonSDSM performance.however,theirbaseprotocolusesinter-processorinterruptstosignalmessagingdelivery. Interruptsoncommoditymachinesaretypicallyontheorderofhundredsofmicrosends,andsolargely erasethebenetsofalow-latencynetwork.ourevaluationhereassumesthatmessagescanbedetected throughamuchmoreecientpollingmechanism,asisfoundwithothersans[10,13],andsoeachof ourprotocolsbenetfromthesamelowmessaginglatency. dis.)ourhomenodemigrationschemeissimilarinprinciple.ifapagehasonlyasinglewriter,the operationsonsharedpageswithonlyasinglewriter.(pageswithmultiplewritersstillusetwinsand Amzaetal.[3]describeadaptiveextensionstotheTreadMarks[2]protocolthatavoidtwin/di homealwaysmigratestothatwriter,andsotwin/dioperationsareavoided.inthepresenceofmultiple avoidingtwin/dioverheadatonenode.cashmereisalsoabletotakeadvantageofthereplicated concurrentwriters,ourschemewillalwaysmigratetooneofthemultipleconcurrentwriters,thereby terns.inamigratoryaccesspattern,apieceofdataisreadandwrittenbyasuccessionofprocessors directorywhenmakingmigrationdecisions(i.e.todetermineifthehomeiscurrentlywritingthepage). Therehasalsobeenmuchworkinadaptingcoherenceprotocoloperationstomigratoryaccesspat- 19
20 usuallyinvolvestwocoherenceoperations(eachwithmultiplemessages),oneforthereadandonefor inalockstepmanner.thispatternresultsinthetransferofdatafromoneprocessortoanother,and thewrite.recentwork[24,7,19]inbothhardwareandsoftwarecoherentsystemsdiscussesmethods couldbebuiltintooursystem,andmaybeveryhelpfulinreducingtheoverheadduetounnecessary toclassifymigratorydataandthencollapsingthetwocoherencemessagesintoone.thistechnique migrationrequests. 5Conclusions inexpensivebroadcast,andtotalordering,onsdsm.ourevaluationusedthestate-of-the-artcashmere protocol,whichwasdesignedwiththesenetworkfeaturesspecicallyinmind. Inthispaper,wehavestudiedtheeectofadvancednetworkfeatures,inparticular,remotewrites, manceimprovements(upto11%)formostapplications.theimprovementsareduetoadecreasein communication,andcorrespondinglyprotocol,overhead.oneapplication,however,improvesdramaticallyby37%.thisapplicationusesadynamicworkdistributionscheme,whichoperatesmoreeectively withthereducedprotocoloverhead.unfortunately,evenaftertheimprovement,theapplicationonly obtainsanextremelypoorspeedupoftwoon16processors. Theuseofremotewritestopropagatedatamodicationshaslittleimpact.Inbarrier-basedprograms, thiscanbeexpected:instrumentationshowsthatmostdimessagesarehandledbyidleprocessors.the Virtuallyalloftheperformancedierenceswehaveseenareduetooptimizedmeta-datamaintenance. Wehavefoundthatthesefeaturesneverhurtsperformanceanddoesindeedleadtomodestperfor- networkfeatureshavelittleeectontheoperationalcostofsynchronizationprimitives,sooptimization inthisareahaslittleeectonoverallperformance. numberoftwin/dioperationsandtheresultingprotocoloverhead.themechanismissoeectivethat thebenetsoutweighthosefromusingthenetworkfeaturesforshareddatapropagation.shareddata Finally,wealsofoundthathomenodemigrationisaveryeectivemechanismforreducingthe canthussafelybeplacedinthenode'sprivatememory.thepressureonremotelyaccessiblememory istherebygreatlyreduced,providingmoreexibilityandscalabilityforthesystem.
21 References [1] [2] S.V.AdveandM.D.Hill.AUniedFormulationofFourShared-MemoryModels.IEEETransactionson [3] ParallelandDistributedSystems,4(6):613{624,June1993. C.Amza,A.L.Cox,S.Dwarkadas,P.Keleher,H.Lu,R.Rajamony,W.Yu,andW.Zwaenepoel.Tread- Marks:SharedMemoryComputingonNetworksofWorkstations.Computer,29(2):18{28,February1996. [4] ComputerArchitecture,SanAntonio,TX,February1997. WriterandMultipleWriter.InProceedingsoftheThirdInternationalSymposiumonHighPerformance C.Amza,A.Cox,S.Dwarkadas,andW.Zwaenepoel.SoftwareDSMProtocolsthatAdaptbetweenSingle [5] A.Bilas,C.Liao,andJ.P.Singh.NetworkInterfaceSupportforSharedVirtualMemoryonClusters. TechnicalReportTR ,DepartmentofComputerScience,PrincetonUniversity,March1998. [6] ofthethirteenthacmsymposiumonoperatingsystemsprinciples,pages152{164,pacicgrove,ca, J.B.Carter,J.K.Bennett,andW.Zwaenepoel.ImplementationandPerformanceofMunin.InProceedings ProgrammingLanguagesandOperatingSystems,SanJose,CA,October1998. onnetworkinterfaces.inproceedingsoftheeighthinternationalconferenceonarchitecturalsupportfor Y.Chen,A.Bilas,S.N.Damianakis,C.Dubnicki,andK.Li.UTLB:AMechanismforAddressTranslation October1991. [7] [8] ofthetwentiethinternationalsymposiumoncomputerarchitecture,sandiego,ca,may1993. A.L.CoxandR.J.Fowler.AdaptiveCacheCoherencyforDetectingMigratorySharedData.InProceedings [9] M.Dubois,J.C.Wang,L.A.Barroso,K.L.Lee,andY.-S.Chen.DelayedConsistencyanditsEecton ProgramminginSplit-C.InProceedings,Supercomputing'93,pages262{273,Portland,OR,November D.Culler,A.Dusseau,S.Goldstein,A.Krishnamurthy,S.Lumetta,T.vonEicken,andK.Yelick.Parallel [10]D.Dunning,G.Regnier,G.McAlpine,D.Cameron,B.Shubert,F.Berry,A.M.Meritt,E.Gronke,and themissrateofparallelprograms.insupercomputing'91proceedings,pages197{76,albuquerque,nm, [11]S.Dwarkadas,A.A.Schaer,R.W.CottinghamJr.,A.L.Cox,P.Keleher,andW.Zwaenepoel.ParallelizationofGeneralLinkageAnalysisProblems.HumanHeredity,44:127{141,1994. C.Dodd.TheVirtualInterfaceArchitecture.InIEEEMicro,pages66{76,March1998. November1991. [12]S.Dwarkadas,K.Gharachorloo,L.Kontothanassis,D.J.Scales,M.L.Scott,andR.Stets.Comparative [13]T.v.Eicken,A.Basu,V.Buch,andW.Vogels.U-Net:AUser-LevelNetworkInterfaceforParalleland EvaluationofFine-andCoarse-GrainApproachesforSoftwareDistributedSharedMemory.InProceedings ofthefifthinternationalsymposiumonhighperformancecomputerarchitecture,orlando,fl,january [14]A.Erlichson,N.Nuckolls,G.Chesson,andJ.Hennessy.SoftFLASH:AnalyzingthePerformanceofClusteredDistributedVirtualSharedMemory.InProceedingsoftheSeventhInternationalConferenceon CopperMountain,CO,December1995. DistributedComputing.InProceedingsoftheFifteenthACMSymposiumonOperatingSystemsPrinciples, ArchitecturalSupportforProgrammingLanguagesandOperatingSystems,pages210{2,Cambridge,MA, [15]R.Gillett.MemoryChannel:AnOptimizedClusterInterconnect.IEEEMicro,16(2):12{18,February [16]R.W.HorstandD.Garcia.ServerNetSANI/OArchitecture.InProceedingsofHotInterconnectsV October1996. Symposium,PaloAlto,CA,August,
22 [17]L.Kontothanassis,G.Hunt,R.Stets,N.Hardavellas,M.Cierniak,S.Parthasarathy,W.Meira,S. [18]M.Marchetti,L.Kontothanassis,R.Bianchini,andM.L.Scott.UsingSimplePagePlacementPolicies works.inproceedingsofthetwenty-fourthinternationalsymposiumoncomputerarchitecture,pages Dwarkadas,andM.L.Scott.VM-BasedSharedMemoryonLow-Latency,Remote-Memory-AccessNet- 157{169,Denver,CO,June1997. [19]L.R.MonneratandR.Bianchini.EcientlyAdaptingtoSharingPatternsinSoftwareDSMs.InProceedingsoftheFourthInternationalSymposiumonHighPerformanceComputerArchitecture,LasVegas,NV, February1998. toreducethecostofcachefillsincoherentshared-memorysystems.inproceedingsoftheninth InternationalParallelProcessingSymposium,SantaBarbara,CA,April1995. []R.Samanta,A.Bilas,L.Iftode,andJ.Singh.Home-BasedSVMProtocolsforSMPClusters:Design [21]D.J.ScalesandK.Gharachorloo.TowardsTransparentandEcientSoftwareDistributedSharedMemory. andperformance.inproceedingsoffourthinternationalsymposiumonhighperformancecomputer Architecture,pages113{124,February1998. [22]D.J.Scales,K.Gharachorloo,andA.Aggarwal.Fine-GrainSoftwareDistributedSharedMemoryon InProceedingsoftheSixteenthACMSymposiumonOperatingSystemsPrinciples,St.Malo,France,October [23]J.P.Singh,W.-D.Weber,andA.Gupta.SPLASH:StanfordParallelApplicationsforShared-Memory. SMPClusters.InProceedingsoftheFourthInternationalSymposiumonHighPerformanceComputer [24]P.Stenstrom,M.Brorsson,andL.Sandberg.AnAdaptiveCacheCoherenceProtocolOptimizedfor Architecture,LasVegas,NV,February1998. ACMSIGARCHComputerArchitectureNews,(1):5{44,March1992. [25]R.Stets,S.Dwarkadas,N.Hardavellas,G.Hunt,L.Kontothanassis,S.Parthasarathy,andM.Scott. MigratorySharing.InProceedingsoftheTwentiethInternationalSymposiumonComputerArchitecture, SanDiego,CA,May1993. [26]M.Welsh,A.Basu,andT.vonEicken.AComparisonofATMandFastEthernetNetworkInterfaces Cashmere-2L:SoftwareCoherentSharedMemoryonaClusteredRemote-WriteNetwork.InProceedings ComputerArchitecture,SanAntonio,TX,February1997. foruser-levelcommunication.inproceedingsofthethirdinternationalsymposiumonhighperformance ofthesixteenthacmsymposiumonoperatingsystemsprinciples,st.malo,france,october1997. [27]S.C.Woo,M.Ohara,E.Torrie,J.P.Singh,andA.Gupta.MethodologicalConsiderationsandCharacterizationoftheSPLASH-2ParallelApplicationSuite.InProceedingsoftheTwenty-SecondInternational SymposiumonComputerArchitecture,SantaMargheritaLigure,Italy,June
DistributedSharedMemorySystems? AdaptiveLoadBalancinginSoftware CompilerandRun-TimeSupportfor SotirisIoannidisandSandhyaDwarkadas fsi,[email protected] DepartmentofComputerScience Rochester,NY14627{0226
The Effect of Contention on the Scalability of Page-Based Software Shared Memory Systems
The Effect of Contention on the Scalability of Page-Based Software Shared Memory Systems Eyal de Lara, Y. Charlie Hu, Alan L. Cox, and Willy Zwaenepoel Department of Electrical and Computer Engineering
18-742 Lecture 4. Parallel Programming II. Homework & Reading. Page 1. Projects handout On Friday Form teams, groups of two
age 1 18-742 Lecture 4 arallel rogramming II Spring 2005 rof. Babak Falsafi http://www.ece.cmu.edu/~ece742 write X Memory send X Memory read X Memory Slides developed in part by rofs. Adve, Falsafi, Hill,
Synchronization Aware Conflict Resolution for Runtime Monitoring Using Transactional Memory
Synchronization Aware Conflict Resolution for Runtime Monitoring Using Transactional Memory Chen Tian, Vijay Nagarajan, Rajiv Gupta Dept. of Computer Science and Engineering University of California at
Initial Setup of Microsoft Outlook with Google Apps Sync for Windows 7. Initial Setup of Microsoft Outlook with Google Apps Sync for Windows 7
Microsoft Outlook with Initial Setup of Concept This document describes the procedures for setting up the Microsoft Outlook email client to download messages from Google Mail using Internet Message Access
VIEWING INVOICES VIA BDM (Banner Document Management)
VIEWING INVOICES VIA BDM (Banner Document Management) TABLE OF CONTENTS Obtaining Access New Users Obtaining Banner Access... 2 Applying for an ApplicationXtender Account... 3 Finding Invoices via FOIDOCH
Software Project Management. Lecture Objectives. Project. A Simple Project. Management. What is involved
Software Project What is happening in the project? Lecture Objectives To discuss the various aspects of project management To understand the tasks in software project management To describe the project
Configuration Backup and Restore. Dgw v2.0 May 14, 2015. www.media5corp.com
Dgw v2.0 May 14, 2015 www.media5corp.com Table of Contents Configuration Backup and Restore... 3 File Servers... 4 Configuring the FTP Server...4 Configuring the TFTP Server...4 Configuring the HTTP Server...
Chapter 4. Authentication Applications. COSC 490 Network Security Annie Lu 1
Chapter 4 Authentication Applications COSC 490 Network Security Annie Lu 1 OUTLINE Kerberos X.509 Authentication Service COSC 490 Network Security Annie Lu 2 Authentication Applications authentication
Supply Chain Management (3rd Edition)
Supply Chain Management (3rd Edition) Chapter 9 Planning Supply and Demand in a Supply Chain: Managing Predictable Variability 9-1 Outline Responding to predictable variability in a supply chain Managing
CE 4.2 to Windows 7 - Synchronism Problem
Product: Mobile Products with Windows CE 4.2 Distribution Date: 10/04/2012 CE 4.2 to Windows 7 - Synchronism Problem This document covers how to get an older device with Windows CE 4.2 or early 5.0 to
AD Integration options for Linux Systems
AD Integration options for Linux Systems Overview Dmitri Pal Developer Conference. Brno. 2013 Agenda Problem statement Aspects of integration Options Questions Problem Statement For most companies AD is
SKF Multilog IMx time synchronization and advanced Windows firewall settings with SKF @ptitude Observer and SKF @ptitude Analyst for Windows 7
Application Note SKF Multilog IMx time synchronization and advanced Windows firewall settings with SKF @ptitude Observer and SKF @ptitude Analyst for Windows 7 By Ronny Sjoberg SKF Condition Monitoring
Network Management, MIBs and MPLS
Network Management, MIBs and MPLS Principles, Design and Implementation Stephen B. Morris 6 Network Management Software Components 2 Network Management, MIBs and MPLS Telnet/ HTTP/ HTTPS/ IPSec Northbound
Everest Horton Bank of Atchison Bank of Gower Bank of McLouth Bank of Oskaloosa Bank of Plattsburg Member FDIC
Frequently Asked Questions Rewards and Qualifications Is there a Service Charge for a My Rewards Checking? o There is never a service charge. It is always a free account. What are the Rewards for My Rewards
OWA/2-Factor Authentication VPN FAQ. Outlook Web Access (OWA) QUESTIONS
Outlook Web Access (OWA) QUESTIONS Q1. With OWA and ActiveSync going away, how does an employee/contractor access Outlook (email, calendar and contacts)? A1. An employee must use their government/contractor
EMD Roles and Responsibilities
EMD Roles and Responsibilities Objectives List/Explain 5 Functions of the EMD Know the attributes for a Successful Dispatcher Know the General Roles and Responsibility of the EMD List 3 phases of Dispatch
Parallel Computing of Kernel Density Estimates with MPI
Parallel Computing of Kernel Density Estimates with MPI Szymon Lukasik Department of Automatic Control, Cracow University of Technology, ul. Warszawska 24, 31-155 Cracow, Poland [email protected]
PCI COMPLIANCE GUIDE For Merchants and Service Members
PCI SAQ C-VT PCI COMPLIANCE GUIDE For Merchants and Service Members PCI DSS v2.0 SAQ CVT Merchant Guide 1 Contents Contents... 2 Introduction... 3 Defining an SAQ C Merchant... 3 REQUIREMENTS FOR SAQ-VT...
Deploying the BIG-IP LTM system and Microsoft Windows Server 2003 Terminal Services
Deployment Guide Deploying the BIG-IP System with Microsoft Windows Server 2003 Terminal Services Deploying the BIG-IP LTM system and Microsoft Windows Server 2003 Terminal Services Welcome to the BIG-IP
LionPATH Mobile: Android
LionPATH Mobile: Android LionPATH Mobile lets you use your mobile device to view class and grade information. The LionPATH Mobile app is available in the public app store for Android. Installation will
A Dynamic Binary Translation System in a Client/Server Environment
A Dynamic Binary Translation System in a Client/Server Environment Chun-Chen Hsu a, Ding-Yong Hong b,, Wei-Chung Hsu a, Pangfeng Liu a, Jan-Jan Wu b a Department of Computer Science and Information Engineering,
Securing Data on Microsoft SQL Server 2012
Securing Data on Microsoft SQL Server 2012 Course 55096 The goal of this two-day instructor-led course is to provide students with the database and SQL server security knowledge and skills necessary to
Medical Claims Electronic Data Transfer DataLink Questions and Answers
Medical Claims Electronic Data Transfer DataLink Questions and Answers 1.0 General Questions 1.1 Q. What are the changes and/or improvements in the new Medical Claims Electronic Data Transfer (MC EDT)
www.dcmortgagesource.com
WORKING TOGETHER TO SERVE YOU BETTER The Ross Group Contact Information Rob Ross Senior Loan Officer [email protected] Direct: 571.266.6503 Cell: 703.568.3749 E-FAX: 703.935.4455 Jaime Young
Rossmoor Website SEO Tracking Sheet 2012-2014 Updated: April 1, 2014
As of 5/4/2012 As of 5/14/2012 active senior living no n/a active senior living no n/a adult golf community no n/a adult golf community no n/a 55+ community yes 8th 55+ community yes 8th retirement living
GAIL/GAS/NOIDA/MKTG/DMA/16/2014-15 E-TENDER NO. -6000000360. Schedule of Rates (SOR)
GAIL/GAS/NOIDA/MKTG/DMA/16/2014-15 E-TENDER NO. -6000000360 Schedule of s (SOR) 1 DEWAS, Carrying out the work rate (In Words) 1 Schedule of s (SOR) 1 KOTA Carrying out the work rate (In Words) 2 Schedule
MS-55096: Securing Data on Microsoft SQL Server 2012
MS-55096: Securing Data on Microsoft SQL Server 2012 Description The goal of this two-day instructor-led course is to provide students with the database and SQL server security knowledge and skills necessary
INTERNATIONAL CONSTRUCTION CONSULTING, LLC PROJECT MANAGEMENT TRAINING PLAN
INTERNATIONAL CONSTRUCTION CONSULTING, LLC PROJECT MANAGEMENT TRAINING PLAN A 10/23/05 Draft FGL Client Executive REVISION NO. ISSUE DATE ISSUE DESCRIPTION PREPARED BY CLIENT APPROVAL TABLE OF CONTENTS
Global Variables. However, when global variables are used in a function block or control modules, they must be declared as external
Global Variables You can define global variables in the Application Editor. Global variables are available in all programs. It is only possible to declare global variables on application level. Global
FAQ - Frequently Asked Questions Sections of Questions
FAQ - Frequently Asked Questions Sections of Questions Bell Business Backup Service Installation & Setup Firewall Settings Connections & Backups Account Information Restoring Files Uninstalling the Bell
Bill Hobgood, City of Richmond (VA) & the Association of Public Safety Communications Officials International PAPER #
The Vehicle Emergency Data Set (VEDS): The Logical Path toward Standardization of Data Between Telematics Service Providers and 9-1-1 Public Safety Answering Points Bill Hobgood, City of Richmond (VA)
Genius in Salesforce.com Pre- Installation Setup
Genius in Salesforce.com Pre- Installation Setup Contents Introduction... 3 License Level... 3 Salesforce Profile Permission... 3 Administration Permissions:... 3 General User Permissions:... 4 Standard
Introduction to the Junos Operating System
Introduction to the Junos Operating System Chapter 5: Secondary System Configuration 2012 Juniper Networks, Inc. All rights reserved. www.juniper.net Worldwide Education Services Chapter Objectives After
Application Note. Configuring a NEO Tape Library with Symantec Backup Exec and NetApp NDMP Environment. Technical Bulletin. November 2013.
Technical Bulletin Application Note November 2013 Configuring a NEO Tape Library with Symantec Backup Exec and NetApp NDMP Environment Summary This application note describes how to configure a NEO tape
Emerging Markets Local Currency Debt and Foreign Investors
Emerging Markets Local Currency Debt and Foreign Investors Recent Developments Daniela Klingebiel Pension & Endowments Departments Nov. 20, 2014 Outline Structural trends in emerging markets (EM) external
Maryland Prescription Drug Monitoring Program (PDMP) Hospice Inpatient Waiver Application
1 Maryland Prescription Drug Monitoring Program (PDMP) Hospice Inpatient Waiver Application Background Health-General Article, Section 21-2A-03(f), Annotated Code of Maryland, allows pharmacies to be granted
Projektron BCS 7.24 More than a project management software
Projektron BCS 7.24 More than a project management software Imprint Projektron GmbH Charlottenstraße 68 10117 Berlin +49 30 3 47 47 64-0 [email protected] www.projektron.com As of 21.07.2015-11:41 Legal
Ein einheitliches Risikoakzeptanzkriterium für Technische Systeme
ETCS Prüfcenter Wildenrath Interoperabilität auf dem Korridor A Ein einheitliches Risikoakzeptanzkriterium für Technische Systeme Siemens Braunschweig, Oktober 2007 Prof. Dr. Jens Braband Page 1 2007 TS
INTERCONNECTION SECURITY AGREEMENT
INTERCONNECTION SECURITY AGREEMENT March, 2008 U. S. Department of Homeland Security Customs and Border Protection INTERCONNECTION SECURITY AGREEMENT The intent of the Interconnection Security Agreement
Dufferin-Peel Catholic District School Board DESIGN DEPARTMENT MONTHLY PROJECT REPORT
DATE ISSUED MAY 2002 REVISION SEPTEMBER 2005 LATEST REVISION JULY 2015 1. GENERAL DATA PROJECT NAME: DATE: It is mandatory to include a written brief of the status of the project and an updated schedule
911 Phone System: Telecommunicator Furniture PHONE & FURNITURE TOTAL: 22,080.00. Software Expenditures
911 Phone System: cost 9-1-1 trunk line charges 8,280.00 Basic line charge only **One administrative line per call-taking position Interpretive Services Selective Routing and ALI provisioning 13,800.00
INTERNET ATTACKS AGAINST NUCLEAR POWER PLANTS
INTERNET ATTACKS AGAINST NUCLEAR POWER PLANTS Kleissner & Associates IAEA, 1-5 June 2015, Vienna/Austria International Conference on Computer Security in a Nuclear World Programmer and security researcher
Writing Better Requirements The Key to a Successful Project
Writing Better Requirements The Key to a Successful Project Kimberly Roberts Senior Application Engineer kimberly_roberts [email protected] Objectives Learn how good requirements definition can benefit
DynamicLoadBalancing ExploitingProcessLifetimeDistributionsfor MorHarchol-Balter and AllenB.Downey tionwhetherpreemptivemigration(migratingactiveprocesses)isnecessary,orwhetherremote WeconsiderpoliciesforCPUloadbalancinginnetworksofworkstations.Weaddresstheques-
Working with Motorola RFID
APPFORUM Americas 2012 Working with Motorola RFID Andy Doorty Ι Alliance Manager Adithya Krishnamurthy Ι Solution Manager APPFORUM Americas 2012 RFID Product Overview RFID Architecture RFID API3 for.net
Introduction to Meg@POP
Introduction to Meg@POP SingTel Meg@POP is a comprehensive suite of IP services for businesses which need a secured connection to multiple locations, parties or services in Singapore. It provides a simple
How to Setup SQL Server Replication
Introduction This document describes a scenario how to setup the Transactional SQL Server Replication. Before we proceed for Replication setup you can read brief note about Understanding of Replication
Export the address book from the Blackberry handheld to MS Outlook 2003, using the synchronize utility in Blackberry Desktop Manager.
Export the address book from the Blackberry handheld to MS Outlook 2003, using the synchronize utility in Blackberry Desktop Manager. Revised 2/25/07 Creating a New Contacts Folder in Outlook Open Outlook
MEDIA SYNCHRONIZATION STANDARDISATION AT W3C
MEDIA SYNCHRONIZATION STANDARDISATION AT W3C Jack Jansen, CWI MEDIA SYNCHRONIZATION STANDARDISATION - THE MISSING BITS Jack Jansen, CWI OVERVIEW Why bother with synchronization? How should we tackle it?
Caching SMB Data for Offline Access and an Improved Online Experience
Caching SMB Data for Offline Access and an Improved Online Experience Agenda What is Offline Files How does Offline Files interact with SMB Offline Files enhancements for Windows 7 Questions 2 What is
Gerard Fianen. Copyright 2014 Cypherbridge Systems LLC [email protected]. Page 1
Securing the Internet of Things Gerard Fianen Copyright 2014 Cypherbridge Systems LLC [email protected] Page 1 INDES-IDS BV - Embedded Software Development The choice of professionals [email protected]
Tradeoffs in Transactional Memory Virtualization
Tradeoffs in Transactional Memory Virtualization JaeWoong Chung, Chi Cao Minh, Austen McDonald, Travis Skare, Hassan Chafi, Brian D. Carlstrom, Christos Kozyrakis and Kunle Olukotun Computer Systems Laboratory
EEI Business Continuity. Threat Scenario Project (TSP) April 4, 2012. EEI Threat Scenario Project
EEI Business Continuity Conference Threat Scenario (TSP) April 4, 2012 EEI Threat Scenario 1 Background EEI, working with a group of CIOs and Subject Matter Experts, conducted a survey with member companies
Price of telecommunication services on IP networks and the impact of VoIP on the price of the national and international telephone services in Togo
- 1 - Regional seminar on costs and tariffs for telecommunication services Price of telecommunication services on IP networks and the impact of VoIP on the price of the national and international - 2 -
Foreign Exchange markets and international monetary arrangements
Foreign Exchange markets and international monetary arrangements Ruichang LU ( 卢 瑞 昌 ) Department of Finance Guanghua School of Management Peking University Some issues on the course arrangement Professor
Thepurposeofahospitalinformationsystem(HIS)istomanagetheinformationthathealth
FederatedDatabaseSystemsforReplicatingInformationin UniversityofDortmund,DepartmentofComputerScience,Informatik10 ExtendingtheSchemaArchitectureof E-mail:[email protected] HospitalInformationSystems
guide to getting started
guide to getting started Palm Treo TM 650 the future is friendly table of contents activating your device 3 enabling your wireless connection 4 setting up your email corporate email account 5 personal
PM Planning Configuration Management
: a Project Support Function As stated throughout the Project Planning section, there are fundamental components that are started during the pre-performance stage of the project management life cycle in
E-Travel Initiative Electronic Data Systems (EDS) FedTraveler.com
PRIVACY IMPACT ASSESSMENT E-Travel Initiative Electronic Data Systems (EDS) FedTraveler.com August 20, 2007 Prepared by: GSA Office of Governmentwide Policy (OGP) E-Travel Program (MO) 1800 F Street NW
Architecture of End-to-End QoS for VoIP Call Processing in the MPLS Network
1/18 Fifth International Workshop on Quality of future Internet Service(QofIS 04) Architecture of End-to-End QoS for VoIP Call Processing in the MPLS Network 2004. 9. 29 National Computerization Agency
EESTEL. Association of European Experts in E-Transactions Systems. Apple iphone 6, Apple Pay, What else? EESTEL White Paper.
EESTEL White Paper October 29, 2014 Apple iphone 6, Apple Pay, What else? On 2014, September 9 th, Apple has launched three major products: iphone 6, Apple Watch and Apple Pay. On October 17 th, Apple
Internet!of!Services! Project!IntroducMon!
Internet!of!Services! Project!IntroducMon! Prof.!Dr.!Küpper,!S.!Göndör,!S.!Zickau,!M.!Slawik,!et!al.! ServiceIcentric!Networking! Telekom!InnovaMon!Laboratories!and!TU!Berlin! Introduc)on* Project!OrganizaMon!!
Using SSH Secure FTP Client INFORMATION TECHNOLOGY SERVICES California State University, Los Angeles Version 2.0 Fall 2008.
Using SSH Secure FTP Client INFORMATION TECHNOLOGY SERVICES California State University, Los Angeles Version 2.0 Fall 2008 Contents Starting SSH Secure FTP Client... 2 Exploring SSH Secure FTP Client...
How To Use Kentico+ On A Pc Or Mac Or Macbook
Kentico+ documentation Kentico+ documentation Home............................................................................. 3 1 Creating subscriptions and projects......................................................................
Centerity Service Pack for Microsoft Exchange 2013 Keep your e-mail services up and running!
Centerity Service Pack for Microsoft Exchange 2013 Keep your e-mail services up and running! Key Features Client & server monitoring. Tens of monitored KPI s for in-depth inspection. Wide range of usage
Migrating from MyYSU Mail to Office 365 Microsoft Outlook 2010
Migrating from MyYSU Mail to Office 365 Microsoft Outlook 2010 Required Items: Microsoft Outlook 2010, MyYSU e-mail account This guide will assist you with configuring the Microsoft Outlook 2010 email
Step-by-Step Configuration Instructions
Overview Harvard Medical School (HMS) recently made some adjustments to allow students the ability of accessing their HMS student email remotely with a desktop email client. Students may now use a secure
PARSEC vs. SPLASH 2: A Quantitative Comparison of Two Multithreaded Benchmark Suites on Chip Multiprocessors
PARSEC vs. SPLASH 2: A Quantitative Comparison of Two Multithreaded Benchmark Suites on Chip Multiprocessors ChristianBienia,SanjeevKumar andkaili DepartmentofComputerScience,PrincetonUniversity MicroprocessorTechnologyLabs,Intel
HAMPTON LUMBER SALES COMPANY
9600 SW Barnes Road Suite 200-6666 www.hamptonaffiliates.com To: ATTN: # of pages sent: 5 DATE: We want to thank you for your interest in being added to our carrier base, a group of carriers that are committed
Using SMI-S for Management Automation of StarWind iscsi SAN V8 beta in System Center Virtual Machine Manager 2012 R2
Using SMI-S for Management Automation of StarWind iscsi SAN V8 beta in System Center Virtual Machine Manager 2012 R2 September 2013 TRADEMARKS StarWind, StarWind Software and the StarWind and the StarWind
