portion,theso-calledserver,providesbasicservicessuchasdatai/o,buermanagementandconcurrency



Similar documents
Scalable Internet Services and Load Balancing

Scalable Internet Services and Load Balancing

Using Synology SSD Technology to Enhance System Performance. Based on DSM 5.2

Globus Striped GridFTP Framework and Server. Raj Kettimuthu, ANL and U. Chicago

E4 UNIFIED STORAGE powered by Syneto

How To Design A Data Center

Agenda. Enterprise Application Performance Factors. Current form of Enterprise Applications. Factors to Application Performance.

CS5460: Operating Systems. Lecture: Virtualization 2. Anton Burtsev March, 2013

Using Synology SSD Technology to Enhance System Performance Synology Inc.

Using Synology SSD Technology to Enhance System Performance Synology Inc.

CSIS CSIS 3230 Spring Networking, its all about the apps! Apps on the Edge. Application Architectures. Pure P2P Architecture

Introduction. Options for enabling PVS HA. Replication

Using High Availability Technologies Lesson 12

Cisco Prime Home 5.0 Minimum System Requirements (Standalone and High Availability)

CompTIA Storage+ Powered by SNIA

New!! - Higher performance for Windows and UNIX environments

Storage Networking Management & Administration Workshop

Scalable Windows Storage Server File Serving Clusters Using Melio File System and DFS

Cloud Based Application Architectures using Smart Computing

Data Backup and Archiving with Enterprise Storage Systems

Configuration Maximums VMware Infrastructure 3

Scale-Out File Server. Subtitle

Xangati Storage Solution Brief. Optimizing Virtual Infrastructure Storage Systems with Xangati

Chapter 13 Selected Storage Systems and Interface

COSC 6374 Parallel Computation. Parallel I/O (I) I/O basics. Concept of a clusters

VMware vsphere 5.1 Advanced Administration

Data Center Infrastructure

Parallels Server 4 Bare Metal

Network Attached Storage. Jinfeng Yang Oct/19/2015

Maximizing Server Storage Performance with PCI Express and Serial Attached SCSI. Article for InfoStor November 2003 Paul Griffith Adaptec, Inc.

Performance Comparison of Fujitsu PRIMERGY and PRIMEPOWER Servers

MaxDeploy Ready. Hyper- Converged Virtualization Solution. With SanDisk Fusion iomemory products

Scala Storage Scale-Out Clustered Storage White Paper

Chapter 3. Internet Applications and Network Programming

Session Storage in Zend Server Cluster Manager

Designing HP SAN Networking Solutions

InfoScale Storage & Media Server Workloads

Eloquence Training What s new in Eloquence B.08.00

Uptime Infrastructure Monitor. Installation Guide

Quantum StorNext. Product Brief: Distributed LAN Client

Windows Server Performance Monitoring

Google File System. Web and scalability

Virtualization. Pradipta De

Backup Exec 9.1 for Windows Servers. SAN Shared Storage Option

Accelerating Microsoft Exchange Servers with I/O Caching

The Microsoft Windows Hypervisor High Level Architecture

Managing your Red Hat Enterprise Linux guests with RHN Satellite

CSC 2405: Computer Systems II

M.Sc. IT Semester III VIRTUALIZATION QUESTION BANK Unit 1 1. What is virtualization? Explain the five stage virtualization process. 2.

Making the Move to Desktop Virtualization No More Reasons to Delay

Storage Networking Foundations Certification Workshop

EMC DATA DOMAIN OPERATING SYSTEM

EMC DATA DOMAIN OPERATING SYSTEM

ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective

PERFORMANCE TUNING ORACLE RAC ON LINUX

White Paper. Recording Server Virtualization

Scalable Windows Server File Serving Clusters Using Sanbolic s Melio File System and DFS

Monitoring and Diagnosing Oracle RAC Performance with Oracle Enterprise Manager. Kai Yu, Orlando Gallegos Dell Oracle Solutions Engineering

HBA Virtualization Technologies for Windows OS Environments

Server Virtualization: Avoiding the I/O Trap

Installation and Configuration Guide for Cluster Services running on Microsoft Windows 2000 Advanced Server using Acer Altos Servers

Hewlett Packard - NBU partnership : SAN (Storage Area Network) или какво стои зад облаците

TRACE PERFORMANCE TESTING APPROACH. Overview. Approach. Flow. Attributes

Understanding Storage Virtualization of Infortrend ESVA

EmulexSecure 8Gb/s HBA Architecture Frequently Asked Questions

IBM Global Technology Services September NAS systems scale out to meet growing storage demand.

EZManage V4.0 Release Notes. Document revision 1.08 ( )

Fibre Channel Overview of the Technology. Early History and Fibre Channel Standards Development

List of Figures and Tables

Client/Server Computing Distributed Processing, Client/Server, and Clusters

The team that wrote this redbook Comments welcome Introduction p. 1 Three phases p. 1 Netfinity Performance Lab p. 2 IBM Center for Microsoft

Milestone Solution Partner IT Infrastructure Components Certification Summary

HP SN1000E 16 Gb Fibre Channel HBA Evaluation

A Deduplication File System & Course Review

Monitoring and Diagnosing Oracle RAC Performance with Oracle Enterprise Manager

P2P Storage Systems. Prof. Chun-Hsin Wu Dept. Computer Science & Info. Eng. National University of Kaohsiung

IOS110. Virtualization 5/27/2014 1

21 st Century Storage What s New and What s Changing

Implementing a Digital Video Archive Based on XenData Software

OPTIMIZING SERVER VIRTUALIZATION

Question: 3 When using Application Intelligence, Server Time may be defined as.

Diablo and VMware TM powering SQL Server TM in Virtual SAN TM. A Diablo Technologies Whitepaper. May 2015

Outline: Operating Systems

Clustering Windows File Servers for Enterprise Scale and High Availability

RAID technology and IBM TotalStorage NAS products

WHITE PAPER Guide to 50% Faster VMs No Hardware Required

RAID. RAID 0 No redundancy ( AID?) Just stripe data over multiple disks But it does improve performance. Chapter 6 Storage and Other I/O Topics 29

Oracle Big Data SQL Technical Update

Java DB Performance. Olav Sandstå Sun Microsystems, Trondheim, Norway Submission ID: 860

RUNNING vtvax FOR WINDOWS

OVERVIEW. CEP Cluster Server is Ideal For: First-time users who want to make applications highly available

Binary search tree with SIMD bandwidth optimization using SSE

Veeam Best Practices with Exablox

DIABLO TECHNOLOGIES MEMORY CHANNEL STORAGE AND VMWARE VIRTUAL SAN : VDI ACCELERATION

EVOLUTION OF NETWORKED STORAGE

Transcription:

DynamO:DynamicObjectswithPersistentStorage JiongYang,SilviaNittel,WeiWang,andRichardMuntz UniversityofCalifornia,LosAngeles DepartmentofComputerScience intensiveapplications,e.g.,datamining,scienticcomputing,imageprocessing,etc.inthispaper,we attacheddisks,thetraditionalclient-serverarchitecturebecomessuboptimalformanycomputation/data Inlightofadvancesinprocessorandnetworkingtechnology,especiallytheemergenceofnetwork fjyang,silvia,weiwang,muntzg@cs.ucla.edu LosAngeles,CA90095 introducearevisedarchitectureforthiskindofapplication:thedynamicobjectserverenvironment serverisdividedintomoduleswhicharedynamicallymigratedtotheclientondemand.also,datais (DynamO).Themaininnovationofthisarchitectureisthatthefunctionalityofapersistentstorage transfereddirectlytotheclient'scachefromnetwork-attacheddisks,thusavoidingmultiplecopiesfroma disktotheserverbuertothenetworkandtheclient.inthisway,aclientonlyplacesasmallloadonthe Abstract 1Introduction Client-serverarchitectureshavebeenpopularoverthepastdecades,andhavealsogainedwideacceptance inthedatabasecommunity.inaclient-serverarchitecture,thesystemcodeisdividedintotwoportions:one adaptability,scalabilityandcostperformance. cachemanagementallowingseveralclientstosharethein-memorydatabyusingtheconceptof\who usesit,servesittoothers".weshowviasimulationmodelshowthisarchitectureincreasesthesystem's server,andalsoavoidsthei/obottleneckontheserver.furthermore,dynamoemploysadistributed portion,theso-calledserver,providesbasicservicessuchasdatai/o,buermanagementandconcurrency controlandrunsonadedicatedmachinesuchasaworkstationorsmp,andanotherportion,theso-called client,whichprovidestheapiandexecutesonthesameapplicationmachine.normally,aserverinteracts withmanyclients.insuchanarchitecture,thescalabilityandperformanceoftheoverallsystemsignicantly attacheddisksandoftenprocesseddatarstandsenttheoutputdatatoclients,thus,reducingtheloadon secondmakingthenetworkthebottleneckfordataintensivecomputinganddiski/o.serversuseddirectly turestoemerge.inrecentyears,twomajortrendsinhardwaredevelopmenthaveimpactedtheeciencyof dependsonthecomputepower,aggregatebandwidth,etc.oftheservermachine,thedatai/orateofclients, thenetwork.today,point-to-pointconnectedbrechanneliscapableoftransferringdataat100mbyte/sec intherangeof10mbit/secwhileatypicalsystembus'bandwidthwasintherangeoftensofmbyte/sec serverandclientmachines.atthebeginningofthenineties,thetypicalbandwidthofa\fast"networkwas theclient-serverarchitecture:theemergenceofnetworkattachedstorage,andtheincreaseofcpupowerof andthescalabilityoftheserveritself. andtheindustryprojectionisthatitsbandwidthwillreach400mbyte/secsoon[fca]whileduringthepast Advancesinprocessorandlocalareanetworktechnologymakeitpossiblefordierentsoftwarearchitec- individualdiskwillsustainabandwidthontheorderof40to50mb/sec.thistrendsuggeststhatsystems willhavemuchhigheraggregatediski/obandwidthinthenearfuture(400mb/secpoint-to-point)if weconsiderhowtogetthedatatotheprocessorconsideringthatthesustainedbandwidthofasingles-bus attachedtoabrechannelbasednetwork.asaresult,largeaggregatediskbandwidthisnotaproblemuntil about60%peryear[gro96].bytheyear2000,amegabyteofdiskwillcostaboutfourcentsandeach devicesdirectlytothenetworkinsteadoftoaservermachine. decadethesustainedbandwidthofthetypicalsystembushasincreasedonlyseveraltensofmbyte/sec. Thus,thenetworkisnolongerthebottleneckinaLANenvironmentanditisfeasibletoattachstorage Therateofincreaseinthebandwidthofasinglediskisabout40%ayear,andthepriceperMBdrops

orpcibusisintherangeof20to60mb/sec[arp97]today,especiallyifweassumeasingleservermachine thatperformsthediski/oformanyclientmachines. AnotherrelevanthardwaretrendconcernstheCPUpowerofclientandservermachines.Tenyearsago, servermachineswereequippedwithmuchmorepowerfulcpusthantheclientmachines.asaresult,servers weredesignedtoperformmostoftheworkwithinaclient-serverbasedsystem.thesefactshavechanged dramaticallyduringthepasttenyears.today,clientmachinesareequippedwithpowerfulcpussimilarto servermachines;furthermore,theyarenormallylessutilizedthanservermachines.theaveragenumberof CPUsinaservermachineisuptothirty(inanSMPmachine)comparedtotwoorfourCPUsincurrentclient workstations;however,forapplicationsthatdonotexhibithighparallelism,itdoesnotnecessarilyincrease responsetimeifmoreworkistransferredtotheclients.usually,therearemanymoreclientmachines thanservermachinesandtheaggregatecomputepoweroftheclientsisoftenmorethanthatoftheserver machines.(underthesecircumstances,movingsameoftheworkloadstoclientscanrelievetheserverand reducequeueingdelays.) Basedonthetechnologytrendsdiscussedabove,weintroduceamorescalableapproachtopersistentobject systems:dynamicobjectswithpersistentstorage(dynamo).dynamoprovidesanapplicationinterfaceand functionalitysimilartotraditionalpersistentobjectsystems(pos)suchasexodus[car86],mneme[mos88] orkiosk[nit96],andallowsthestoring,clusteringandretrievingofstorageobjectswhicheachconsistsof anobjectidentierandanunstructuredbytecontainer.dynamo'sarchitecture,however,isdierentfrom thetraditionalclient-serverarchitectureofsuchsystems.thesystemhasalayeredarchitectureconsisting ofani/olayer,abuermanagementlayer,andanobjectmanagementlayer,alsoprovidingtransaction management.indynamo,theobjectmanagementlayerresidesontheclientmachine,andinteractswith aserverpartontheservermachine.however,theserverpartismuchsmallerindynamo,andactslikea coordinator.atdataaccesstime,thenecessaryservercodeforbuermanagement,andcataloginformationis dynamicallydownloadedtotheclientmachine,andthenrunsontheclientmachine.(thedynamicdownloadis notreallynecessary;alternativelyitcouldresidepermanentlyontheclientmachine.)theobjectmanagement layercommunicateswiththecoordinatorontheservermachineaboutthelocationofrelevantdata.however, insteadofloadingdatathroughtheservermachine,theobjectmanagementontheclientmachineinteracts withdynamo'si/olayerthatresidesonthediskcontrollersofthenetwork-attacheddisk.thisi/olayer performsphysicalandlogicaldevicemanagement,andprovidestheabstractionofdatapagestotheobject managementlayer.requesteddataisdirectlyretrievedfromthenetwork-attacheddiskandcachedlocally ontheclientmachine,thus,eliminatingthebottleneckcausedbytheserver'sbusbandwidthlimitation(see Figure1). DynamOeliminatesthetraditionalbuerarchitectureofPOSinwhichthesystembuerresidesonthe servermachine.indynamo,eachclientactsasacacheforlocaldatathatissharedwithotherclients, thus,providingadistributedcache.sincethecollectivememoryonallclientmachinesisusuallymuchlarger thanthatonservermachines,thecachehitratecanbeimproved,anddiski/oavoided.forexample,it isreportedthatthecachehitratedoublesinthenowenvironmentwhichemploysthistypeofdistributed clientcachescheme[and96]. Theredesignedarchitectureofapersistentobjectsystemaccountsforhigherperformanceandsignicantly improvedscalability.inadataintensivecomputingapplication,suchasdataintensivepersistentprograms, theservermachine'sbuscaneasilybecomeabottleneckinthetraditionalarchitecture.however,indynamo, sincethedatadoesnotgothroughtheservermachine'sbus,thisbottleneckiseliminated.ontheotherhand, forthosecomputation-anddata-intensiveapplications,suchasdatabaseapplicationswithalargenumber ofclientsalargepercentageofworkisdoneonservermachine(s),sothattheservermachine'cpuishighly utilizedintheclientserverenvironment,thus,impactingtheperformanceoftheoverallsystem.indynamo, mostofthissameworkisdoneonclientmachines,thus,theservermachine'scpuwillnotsaturateas quicklyandtheproposedarchitectureismorescalable.alsofordata-intensiverealtimeapplications,such asmultimedia,thenumberofclientsaservercanaccommodateislimitedbytheservermachine'scompute powerandtheaggregatebandwidthitcansupportintheclientserverenvironment.however,indynamo, itisonlylimitedbytheaggregatebandwidthofthenetworkandtheaggregatecomputepoweronclient machines,whichismuchlarger.oursimulationresultsshowthatdynamohasmuchbetterscalabilityand performancethanthetraditionalclientserverarchitecture. WedonotclaimthatDynamOworksbetterthanthetraditionalclientserverarchitectureforallapplications.However,ifanapplicationrequiresalargenumberofCPUcyclesand/oraccesstoalargequantityof

a Clients i b h c g Servers 1 5 Clients 9 2 4 3 Servers d 8 f 6 e Server 7 Network Local Attached Disks Disks therequirementsforandproblemsofadistributedpersistentobjectsysteminsection3.insection4,we data,thenmovingmethodexecutiontoclientsandenablingdirectaccesstostoragedevicescaneectively removetheserverbottleneck. approachtopersistentobjectssystemsinsection5.section6containsourconclusionsandfuturework. presentanddiscussthedynamoapproach,andcomparetheperformanceofdynamowiththetraditional Theremainderofthispaperisorganizedasfollows.WeintroducerelatedworkinSection2,anddiscuss (a) Client-Server Architecture (b) Architecture (1) An application invoked client machines. (2) client (a) An application invoked client machine. (b) client sends a request server. (3) server processes request. sends a request server. (c) server processes request (4) server sends necessary code handler to locate necessary data. (d) server sends I/O request client. (5) client processes server s message. (6) client its local disks. (e) local disks retrieves data. (f) sends I/O request Network Attached Storage (NAS). anddelegationofprocessingtoclients. 2RelatedWork WorkrelatedtoDynamOcanmostlybefoundintheareaoflesystemsindistributedenvironmentsusing network-attacheddisks,andtheresearchdoneonnetwork-attacheddisks,distributedcachemanagement, local disks send via PCI bus. (g) server (7) NAS retrieved data. (8) NAS sends client. processes data. (h) sends results client. (9) client executes code on data. 2.1ServerlessNetworkFileSystem AsclientsareaddedtoaLAN,theleservercanbecomesaturated.Toaddressthisproblem,theserverless networklesystem(xfs)wasdevelopedonthenetworkofworkstations(now)attheuniversityof managerorstorageserver,orboth.herealemanagermapsaleintoasetofpageswhileastorageserver areattachedtoallworkstations.innow,partorallclientworkstationscanactcooperativelyasale CaliforniaatBerkeley[And96].Allworkstationsareconnectedbyafastlocalareanetworkanddiskdevices itemploysacooperativecachetechnique.whenoneclienttriestoaccessdatawhichisnotcachedinits mapspagesintodiskblocks.asaresult,thislesystemarchitectureprovideshighscalability.moreover, lemanagerisdeterminedstatically.thismeansthatalemanagermanagesaxedsetofpages.inan workloadsonclientscanbehighlyvariableandtheutilizationofresources,e.g.,memory,cpus,disks,and systembuses,canbequitedierentamongpeerworkstations.whichpagesofaleareservedbywhich higherandbetterscalabilityoftheleserverisachieved[and96].moreover,clientworkstationsnotonly actinconcertasthelemanagerandstorageserver,butalsotoexecuteapplicationcode.therefore,the dataisfetchedfromdisks. thedataiscachedatsomeotherclient.ifso,thecacheddataissenttotheclient.otherwise,therequested memory,itasksthedistributedlemanagerforthatdata.inturnthelemanagercheckswhetherornot environmentwheretheresourcescanbefrequentlychanged,(e.g.,peoplebringintheirownlaptopsandplug intothenetworkinthemorning,andbringthelaptopshomeatnight)howtoutilizetheseresourcesasle NOWsuccessfullyusesallworkstations'memoryandbusbandwidth.Asaresult,thecachehitratiois (i) The client continues its work.figure1:applicationprocessingparadigms

managersbecomesachallengewhichisnotaddressedinthexfssystem.thus,howtobalancetheutilization ofresourcesamongallcollaboratingworkstationsremainsanopenquestion.tobalancetheworkloadevenly, DynamOallowsdynamicchangeofownership(lemanger)ofles(data),thusitprovidesthemachinesto supportadaptabilitytodynamicworkloadsandresourcesenvironment. Inaddition,xFSwasdevelopedforUNIXlesystems,andusesalogbasedlewritingtechnique.However, thistechniquedoesnotworkwellintheenvironmentofdatabasesorobjectserversbecausedatabaseand objectserverusuallyexploitexplicitrequirementsfordataallocation,e.g.,sequentialaccesswhichisnotwell servedbyalogstructuredlesystem.furthermore,xfsusestraditionalserverattacheddisks.thestorage serverexecutesonclientworkstationsandconsumesmanypreciousworkstationcpucycles.toconserve theseworkstationcpucycles,dynamoputsmuchofthestorageserverfunctionality(i.e.,i/omanager)on thenetworkattacheddiskcontrollers. 2.2NASD Usuallywhenaleserverretrievesdatafromstorage,itrstcopiesthedatatoitsownmemory,thensends thedatatotheclients.inordertoeliminatecopyingthedatatotheserver'sbuersrst,somelesystems todayusethe\thirdpartytransfer"mechanismsuchase.g.thenetworkattachedsecuredisks(nasd) system.thenasdisacurrentlyresearchprojectatcmu[gib97].innasd,aclientcontactsthele managerontheservermachinewhenittriestoopenale.thelemanagerwillverifywhethertheclient haspermissiontoaccessthatle.ifso,thelemanagerthennotiesthediskcontrollerandgivesale handletotheclient.onsubsequentreadaccesses,theclientdoesnotneedtocontactthelemanager;the clientcandirectlycontactthediskcontroller. NASDstillemploysacentralizedlemanagerthatenforcesconsistencycontrol.Inaddition,theNASD projectfocusesmoreonsecurityissues.dynamoismorefocusedontheissuesofdistributedcachemanagement,itusesadistributedobjectmanageranddistributedcachemanagerwhichcanprovidebetter scalability. 2.3CondorandThinClients Theserverbottleneckhasbeenalongfoughtbattle,andseveralsolutionshavebeenproposed.Werefertotwo inuentialapproaches:condorandthe'thinclients'architecture.condor[tan95]treatsallworkstationsin itsenvironmentconnectedviaafastlocalareanetworkasapoolofresources:memory,disk,andprocessors. Ifaworkstationbecomesidleandajobiswaiting,thejobisassignedtotheidleworkstationforexecution. Oncetheuseroftheworkstationinvokesitsowncomputations,theCondorcontrollingdaemonswillhalt executionofany\visiting"jobandmoveitsexecutiontoanothermachine.however,condorisnotdesigned forpersistentobjectserverbecauseanyprogramcanbeexecutedonanymachine.dynamoaddressesthe persistentobjectserverissuesbydynamicallyloadingtheserverfunctionalitytoallclientmachines. Thethinclientarchitectureisanotherinnovativeapproachtodynamicallymovetheexecutionofapplicationcode.\Thin"referstoboththemachineandtheapplication.AThinclient(machine)hasnohard diskandminimalmemory,thus,thisapproachisonlyusefulforsimplecomputationsusingsmallamounts ofdata.inthe'thinclients'architecture,clientsdynamicallydownloadthecodeanddatathattheyneed forexecutionofauserjob.dataintensiveexecution,however,hastobemovedtotheservermachine.also, thereisnocachekeptontheclientmachine,anditislessfeasiblefordatabaseapplications. 2.4RelatedworktoCacheCoherencyControl Withtheupcomingofdistributedandshareddiskenvironments,somerelatedworktoDynamO'scache coherencysystemhasbeendone.similartodynamo,theworkof[dias89]combinesthecpupowerof severallowendcomputersystems,andintroducesintegratedconcurrency-coherencycontroltoreducethe overheadofcouplingalargenumberofsystems.furthermore,thissystemusesan'intermediate'buerthat canbeaccessedbyallsystemssothatdatai/oisminimizedforallparticipants.itisnotclearwhetherin thisapproachtheintermediatebuerismadeavailablebyonemachineorviasharedmainmemoryofallthe participatingsystems;however,likeindynamoeachoftheparticipatingsystemsmanagesadisjointregion oftheintermediatebuer.incontrasttodynamo,theintermediatebuerpartitionsarestaticallyassigned herelackingtheexibilitydynamooersbyassigningworkbasedonusagepatternandactualworkloadof theclientsystem.furthermore,dynamoemploysamoreexibleschemeofallocatingandmanagingchucks

ofmemoryinordertokeepthemanagementeortperpageminimalsinceweassumethatalargenumber ofpagesismanagedinsuchasystem. 3ScalabilityandPerformanceofaPersistentObjectSystem Designingandimplementingapersistentobjectsystem,forcertainapplicationssuchasdatabasessystems withalargenumberofclients,scienticdata-andcomputationpersistentapplicationsandmultimedia applicationsthescalabilityoftheclient-serverarchitectureandtheserverbottleneckresultingfromaserver machine'ssystembusbandwidthlimitationareimportantdesignissuestoconsider.inthissection,we describethebasicarchitectureofapersistentstoragesystemfordata-andcomputationintensiveapplications, anddiscussitsbottlenecksandproblemsrelatedtoscalability. 3.1Overview Storageobjectservershavebeendevelopedasstorageback-endsforpersistentprogramminglanguages,and non-standarddatabasemanagementsystems(dbms)suchasobject-orienteddbms.today,thisstorage systemtechnologyisalsousedforhigh-intensityapplicationssuchaslargedatabasesystemswithmany usersorasstoragesystemsfordata-andcomputationscienticprogrammingsystemsaswellasmultimedia applications. Astoragesystemoersapplicationsstorageobjectsconsistingofastorageobjectidentierandan unstructured,variable-sizedbytecontainer.themiddlelayerbetweenthei/omanagerandtheapplication interfaceconsistsofaspecializedbuermanagement,employingexiblebueringstrategiesthatsupportthe specializedaccessbehavioroftheabovementionedapplications,andimprovethebuerhitrate. Whiletheclassicalclient-serverarchitectureforastorageobjectserverhasperformedwellforOODBMS withalimitednumberofusersanddata,itsarchitecturehaslimitedscalabilityandperformanceforhighintensityapplications.typically,scienticandmultimediaapplicationsrequireamuchlargerdatathroughputthenthemoretraditionalapplicationsforastorageobjectserver.also,traditionaldbmsapplications encounterscalabilityandperformanceproblemsifalargenumberofdbmsclientshastobeserved.for theremainderofthepaper,weassumethatbothclientandservermachinesareworkstations.assumptions abouttheirperformancecharacteristicswillbedescribedlater. Fibre Channel Arbitrated Loop Network Attached Disk Network Attached Disk Network Attached Disk Client Client Client Server Server Figure2:FibreChannelArbitratedLoop 3.2Problems Inthispaper,wewillcomparetheperformanceoftheDynamOarchitecturewiththetraditionalclient-server architecture.themainpurposeisto(a)evaluatethepotentialbenetsofthedynamoapproachand(b) understandwhatrstorderfactorsinuencetheperformancetradeos. Weusebottleneckanalysistodiscusssomeofthemajoraspectsofperformancecomparisonbetweentwo systemmodels:traditionalclient-serveranddynamo.theargumentsinthissectionaremorequalitativeand

aremeanttoserveasaroadmapforthemoredetailedsimulationresultswhichfollowinlatersections.we areinterestedinhowthebottleneckshiftsinresponsetochangesinthearrivalrates,theservicerates,and soon.figure3illustratessomeaspectsofthequeuingnetworkmodelforthegeneralsystemenvironment. c 1 c system,inwhichtherearemultipleclientmachinesandonlyoneservermachine.theaverageservicerate weassumethatonlyonekindofapplicationexistsinthelanforsimplicity.weassumeaclosedqueuing 2 s includeboththeapplicationprocessingtimeandclientprocessingtimebecausebothofthemareexecuted Figure3:HighLevelSystemDescription c.(inotherwords,theaverageserverservicetimeisshorterthantheaverageclientservicetime.)butthe onclientmachines. ofclientmachineiisdenotedbyciwhiletheaverageservicerateoftheservermachineisdenotedbys.ci InaLANsystem,manydierentapplicationsmayberunningatthesametime.Withoutlossofgenerality, c utilizationleveloftheservermachinecanbemuchhigherinthetraditionalclientservermodelduetothe i Servers largenumberofclientmachines.inthedynamoarchitecture,thecgetssmallerwhilethesgetslarger becausealargefractionoftheloadismigratedtotheclientmachines.thus,theservermachineutilization Foranapplication,theaverageserverserviceratecouldbelagerthantheaverageclientservicerate Clients Machines Therefore,theserverbottleneckcanbealleviated. levelisreducedsignicantlywhiletheclientmachineutilizationlevelincreasesonlybyasmallfraction. Environment RAIDsystemstostripethedataovermultiplediskdevices.Thus,theserversystembusismostlikelyto bethebottleneckintheclient-serverarchitecture.ontheotherhand,dynamoenablesclientmachinesto directlyaccessthestoragedevices,andthustheaggregatesystembandwidthisnotlimitedbytheserver Apersistentobjectservercanbecongestedindata/computationintensiveapplications.Toeliminatethis systembuses. 4TheDynamOArchitecture Weassumethatstoragedevicesarenotabottleneckfordataintensiveapplicationsbecausewecanuse Exodus[Car86],andMneme[Mos88];therefore,wewillfocusonarchitectureissuesinthissection. applicationpointofview,thefunctionalityofdynamoisverysimilartoapersistentstoragesystemsuchas architectureofdynamoisdepictedinfigure4;wewilldescribethesystemfromthebottomup.froman machinestodirectlyaccessfromnetwork-attachedstorage.inthefollowing,wepresentthearchitecture istodynamicallymoveobjectserverfunctionalitytoclient(machines)forexecution,andallowtheclient andprinciplesofdynamo,andfocusondatai/o,cachemanagement,andinterfacestoapplications.the problemandachievehigherperformance,weproposethedynamoarchitecture.themainideaofdynamo

4.1I/OService ThelowestlayerofDynamOistheI/OlayerprovidingdataI/Ofromandtostoragedevices.TheI/Olayer mapslesanddatapagestostoragelocationsonastoragedeviceinasimilarfashiontoani/olayerina conventionalpersistentobjectsystem.toloaddata,dynamoemploysatechniquesimilartothatemployed bynasd[gib97].whenanapplicationinvokesanoperationonanobject,theobjectmanagerrunningon theclientmachinerequestssomestorageobjectsfromtheserverportion(coordinator)ontheservermachine bypassingtheobjectidentierswhichcontainsthedatablocks.however,insteadoffetchingdataforthe client,theserverplaystheroleofacoordinator,andperformsmappingoftheobjectidentierstopages.this mappingisperformedcentrallyontheservermachine,sothatconsistencyproblemsforcataloginformation areminimized.thecoordinatorreturnstotheclientstherelevantpageandleidentiersaswellasthedisk identier(s)ofthedisk(s)thatcontainsthedatapages.thenasdserversimplementconsistencycontrol;in contrast,thedynamocoordinatordoesnotimplementconsistencycontrol.indynamo,theclientscooperate inprovidingcacheconsistency.further,thecacheconsistencyprotocolcanbetailoredtotheobjects. Afterobtainingthepage,leanddiskidentiers,theobjectmanagerontheclientmachinedirectly interactswiththeportionofdynamothatrunsonthediskcontroller(s)ofthenetwork-attacheddisks.this I/Omodelisbasedonthefollowingfacts:sincethediskcontroller'sCPUismostlyonlylightlyutilized,we canuseitfordelegatingtheexecutionofthetwolowestlevelsofthepersistentstoragesystem,i.e.diskblock allocationandfreespacemanagementaswellasobjectpagetodiskblockmapping(pagemanager).thepage managerisdividedintotwocomponents:astrategycomponent,andastorageallocationcomponent.the allocationcomponentusestheinputfromthestrategycomponenttoallocatediskblocks,whilethestrategy componentdecideshowblocksareallocatedforsetsofpages.weassumethatwehavededicatedstorage devicesforthepersistentobjectserveravailablesothattheallocationstrategyiscommonfortheentire disk(e.g.,continuousblockallocation,clustering,etc.).however,ifthisisnotthecase,thestoragedevice issharedwithotherapplicationssuchasapageallocatorforaunixlesystem,weassumethatastorage partitionisallocatedforthepersistentobjectserver,andthattheallocationforthispartitionismanaged viathedynamopageallocationstrategycomponent.thei/omanageronthediskcontrollerretrievesthe relevantpagesfromthedisk,andsendsthemtotherequestingobjectmanager.thepagesarestoredinthe localcacheontheclient.theobjectmanager,nally,performstheobjecttopagemapping,andmakesthe requestedstorageobjectsavailableforprocessing.thei/olayersareillustratedinfigure4. 4.2CacheManagement Dataretrievedfromdiskcanresideintheclient'scache.Inordertoavoidrepeatedretrievalsfromdisk, weemployadistributedcachemanagementscheme,andallowclientstoretrievedatadirectlyfromother clients'cache.thecachemanagementconsistsoftwolayers:thedistributedcachemanagementlayerand localcachemanagementlayerasshowninfigure4. HierarchicalModel Cachemanagementisacomplicatedissue,mainlyduetoconsistencyrequirements.Toaddressthisissue, persistentobjectsindynamoisorganizedinahierarchicalmanner,asshowninfigure5.atthebottom level,thegranuleisapage,i.e.,eachentityrepresentsapage.atthelevelabove,eachentityrepresentsa setofpageswithavaryingnumberofpagesperset.(incurrentversion,thesystemuserwhocreatesthese datapagesdecideswhichpagesetitgoesinto.)thegranuleatthenextlevelupisacluster,setofclusters, setofsetsofclusters,andsoon.onthetoplevel,therearefromafewtenstoafewhundredsrootentities, andeachrootentityrepresentingalargesetofdatapages. Thecoordinatorontheservermaintainsasetofownershiptablesforclientsandanon-ownertable(as illustratedinfigure6).thenon-ownertableliststheentitiesthatarenotcurrentlyownedbyanyclients. Wesay\XownsentityY"whentheobjectmanagerXcangrantreadandwriteaccesstootherclientsfor anydatapagesiny,andthecachemanagerassociatedwithxknowswhetheradatapageinyiscachedand ifso,where.(note,thatwedonotassumeallpagesofyareloadedinthecacheofx)thereasonthatwe usethehierarchicalmodeltomanagecachecoherenceisthatitcanminimizethebookkeepingoverheadper objectmanager.(lookingahead,thehierarchicalmodelalsoprovidesformoreexibleownershiptransfer.) WhenobjectmanagerArequestssomedatafromtheDynamOserver,thecoordinatorcheckswhetherthere

Server Machine Client Machine Applications Coordinator Page lookup, ownership maintaince Client Machine Applications Object Manager Object-Pages Mapping Ownership Table Maintanence Cache Manager Page/ Segment Distributed Cache Management Page/Segment Local Cache Management Change of Ownership Page/Segment Object Manager Object-Pages Mapping Ownership Table Maintanence Cache Manager Page/ Segment Distributed Cache Management Page/Segment Local Cache Management isaclientwhoownstheobjectbeingrequested.ifthereisnosuchclient,theservermakesclientathe Figure4:DynamOlayers Page/ Segment Page/ Segment Disks Controllers I/O Manager Disk Block Allocation Strategies Page-Block Mapping Block Physical I/O: Blocks allocation File -> Extent otherclients. ownerofanentity,objectmanagerahastherighttograntread/writeaccesstoanydataintheentityto ismanagedviabuermanagementstrategiesspeciedinthecachemanagercodefromtheserver.asthe Ontheotherhand,ifthedataisownedbytheobjectmanagerB,thenthecoordinatorrefersAtoB. consistencyprotocolsondierentdatasets.)theobjectmanagercachesthedatainitslocalcachewhich theconsistencyprotocol,dierentservermayusedierentprotocols.moreover,aservermayusedierent owneroftheentity,sendsclientathehandlefortheentity,andmaketheproperentityintheownertable. downloadsthecachemanagercodefromtheserverifitdoesnothavethiscodealready.(thiscodeincludes IfobjectmanagerAretrievestheinformationfromthecoordinatorthatthedataisalreadyownedby Assuming,thatobjectmanagerAistheownerofacluster,itperformstheI/Oasdescribedabove,and Disks Btorequestaccesstothedata,andnegotiatesaccessrightsanddatagranularity. OwnershipTransfer anotherobjectmanagerbonanotherclientmachine,objectmanagerainteractswiththisobjectmanager ownershipofthedatashouldbetransferred.ifitisdeterminedthatanownershiptransferisdesirable(e.g., entityinquestionisv0andv0isadescendantofv.objectmanagerbownsvandobjectmanagerarequests objectmanagerbthinksitwillnotneedtoaccessthedatainnearfuture,)thenthequestioniswhatsubset ofthesetownedbybshouldbetransferred.toillustratethisprocessmoreclearly,let'sassumethatthe WhenAaccessesdatacurrentlyownedbyB,objectmanagersAandBwillalsodecidewhetherthe

Root Level Entity Root Level Entity Set of... Set of Clusters of Pages... Set of... Set of Clusters of Pages Set of Clusters of Pages Set of... Set of Clusters of Pages Set of Clusters of Pages Cluster of Pages Cluster of Pages Cluster of Pages... Set of... Set of Clusters of Pages Cluster of Pages... Set of Pages... Set of Pages Set of Pages... Set of Pages Set of Pages... Set of Pages Set of Pages... Set of Pages......... Figure5:HierarchicalObjectOrganization..................... Object Manager Entities A entity 1, entity 2, entity 4 Non-owner entities write read whetheritneedsthechildentityofv.ifyes,thenthechildrenentityisdecomposedandthisprocessis datainv0.bdecomposesvintoasetofchildrenentities,andoneofthechildrencontainsv0,bthenchecks access access entity 1 none repeateduntilthereisanentitythatisadescendantofvandcontainsv0,whichitisestimated,willnot Figure6:OwnersTablesinCoordinatorandObjectManager A, C B entity 3, entity 5 entity 2 A A entity 6, entity 8... entity 4 B A, B...... disjointentitiesv1,v2,:::,vkwhereski=1vi=v.thenbremovesv1fromthisset.now,insteadofowning commitbecauseitisveryimportantthata,b,andthecoordinatorallagreethatthetransferofownership coordinatorstartsanownershiptransferprocessinvolvingaandb.thisprocessissimilartoatwo-phase notexist,thennoownershiptransferwilloccur.thisisonlyonepossibleschemewithnoclaimstobeing beusedbybinthenearfuture.thenthisentityistheunitforownershiptransfer.ifsuchanentitydoes v,clientbownsv2,v3,:::,vk.also,bsendsallinformationassociatedwithv1(e.g.,whichclienthave occursordoesnot.theresultisthatbreleasesentityv1toa.objectmanagerbdecomposesvtoasetof optimal.forexample,thealgorithmdescribedabovedoesnotaccountforthetotalsizeofentitiesownedby Borthefrequencyofaccessforeachentity,etc.. IfobjectmanagerBwantstotransfertheownershipofv1toA,itwillnotifythecoordinator.Inturn,the Owners Table Non-owner Ownership Table Manager A beingtheownerofsomedataandlockingthedata.ifaclientwanttolocksomedata,ithavetoaskthe cacheddatainv1,whichclienthasread/writelocksondatainv1,etc.)toa.thereisadierencebetween

toseewhetheritalsoownsv2,v3,:::,vk.ifitownstheseobjects,awouldremovev1,v2,:::,vkfromits servesthelocktoothers,e.g.,schedulingthelockrequests. ownerofthatdatatograntthelock.theownerofthedataisnotnecessarylockingthedatacurrently,it codetoclienta. thecodeforcachecoherencefromtheserverifadoesnothavethiscode. coordinatortablewhichcontainsbothclientaandb'sownershiptablesusingessentiallythesameprocedure ownershiptableandputbackvandcontinuethepruningprocessuntilnomoresub-objectscanberemoved. asaandb.whenarequeststhecachecoherencecodefromtheserver,theserverwillsimplyreturnthe Thegoalofthepruningprocessistokeepaminimallistofobjectsintheownershiptable.Then,Adownloads ObjectmanagerA,inturnupdatesitsownershiptablebyaddingv1.Then,itprunesitsownershiptable DistributedCacheRetrieval Afterobtainingtheproperaccesspermission,objectmanagerAcanaskitsdistributedcachemanagement Ontheserverside,afterreceivingthemessageofownershipchange,thecoordinatorupdatesitsown ownedbyit.thedcmlonclientawillcontactthedcmlofclientmachine,sayb,whichownsthedata. willaskitslcmlforthedata,andthecacheddataisreturnedtoclienta. layerforthedata.adistributedcachemanagementlayer(dcml)maintainsalistofcacheddataofentities (AcancontactthecoordinatorandthecoordinatewilltellitthatBownsthedata.)TheDCMLonclient C),thenitsendsarequesttotheDCMLonClientCwiththeleidandpageids.TheDCMLonClientC Bknowswhich,ifany,clientcachesthedata.Ifthedataisnotcached,thenBwillsendbackahandleto thedatale,page,anddiskidentier(s),andadiski/ohastobeperformedbyclienta.thelocalcache managementlayer(lcml)onaasksthepropernetworkattacheddisksforthedata. sendingamessageofdeletionofownershiptothecoordinatorandremovingv1fromitsownershiptable. Whentheserverreceivesamessageofdeletionofownership.Itremovestheobjectfromitsownerandputs ittotheno-ownertableandprunestheno-ownertable.thisprocessisillustratedinfigure7. AfterclientAnishesprocessingthedata,itcanreturntheownershipoftheentitytotheserverby Ontheotherhand,iftheDCMLonClientBdiscoversthatanotherclienthascachedthedata(sayClient words,ifaclientrequestsanobjectatlevel5(a'lowlevel'),andthereisalevel2objectinthenon-owner table,whichisanancestoroftheobjectrequested,thentheserverwillgivethelevel2objecttotheclient ratherthandecomposethelevel2objectandgivetheclientahigherlevelobject. Recoveryfromaclientmachinecrash coordinatorwillreturnthelargestentitythatcontainsthedataandnotownedbyanyotherclients.inother Fromtheprocessdescribedabove,itclearthatwhenaclientrequestsdatafromthecoordinator,the entities,thenothersurvivingclientsmayrequestthedataownedbythecrashedclientmachine,andof problem.whenanobjectmanagerrequestsdatafromanotherobjectmanager,ithasatimeoutmechanism. Iftheobjectmanagerdoesnotrespondwithinthetimeoutperiod,therequestingobjectmanagerwillreport tothecoordinatorthattheotherobjectmanagercouldhavecrashed.then,thecoordinatorwillcontact course,theserequestscannotbeserved.weapplyacrashrecoveryprotocoltoeliminatethiskindof thespeciedobjectmanager.iftheobjectmanagerstilldoesnotrespond,thecoordinatorwillrevoke allownershipofthespeciedobjectmanager,andassignitsentitiestothenon-ownertable.alsoduring normaloperation,eachobjectmanagerwriteslogsandperiodicallyputsitonthenetworkattachedstorage Wenowdiscussthescenariothatoccurswhenaclientcrashes.Ifthecrashedclientisanownerof coordinator)androllbackthechangestothelastconsistentstate.thisprocedureissimilartothatusedin sothatduringacrash,thecoordinatorcanfetchthelog(thelocationofthelogonthedisksisknowntothe distributedserverfailurerecovery.whenaclientmachinerecoversfromacrash,theobjectmanageronthat clientwillcontactthecoordinatortoretrievethedatathatitneeds. problemiseliminated.inaddition,eachobjectisownedbyatmostoneowner.therefore,cachecoherencycan bemaintainedbytheowner.thisavoidscomplicateddistributedalgorithmsforcacheupdating.moreover, theserver,andthecoordinatoronlytrackswhoistheownerofeachentity.therefore,thecongestedserver Inthisscheme,theworkontheservermachineisminimal.Infact,onlythecoordinatorremainson

Server a b Client A c d Client B e f Figure7:TheprocessofDataFetching Network Attached Disks a: Client A asked server for d: cached Client B will ask b: other owns that caches give A a copy server makes A owner of of cached object. Otherwise, Client B maynotworkwell.inaddition,sincetheremaybemorethanoneserverinanetwork,dynamoletsaclient thehierarchicalobjectmodelmakesthepartitionofcachemanagementeasyandfair.ifoneclientaccesses alargeamountofdata,itcouldservealargeamountofdatatootherclients.furthermore,ifanobject theoceandplugthemintothenetworkinthemorningandbringthemhomeatnight),astaticpartition managerrequestssomedatafromthecoordinator,itmaybecometheownerofaverylargerootentity. returns handler goes e. wants transfer ownership of Otherwise, returns id of requested, B gives A a handle that owns data. makes A ownner. c: Client A asks B for data. e: Client A goes correct disks asks for longasthereisonlyoneownerandonecoherencycontrolprotocolforagivenobject. protocolsfromdierentserversfordierentobjectsandallofthemcanworkconcurrentlyandcorrectlyas downloadthecoherencycontrolcodefromtheserver.aclientcandownloaddierentcoherencycontrol However,ifotherclientsrequestdatainthisrootentity,thisclientmachinecantransfertheownershipof basedondynamicusageratherthanonsomestaticaprioripartition,suchasusedinxfs.especiallyinthe environmentthatmachinescanjoininandleaveanetworkfrequently,(e.g.,peoplebringstheirnotebooksto someentities(whicharedescendantoftherootentity)tootherclients.asaresult,theservicepartitionis relevent blocks. f: returned. exploredinfalcon[she97]. issueandheterogeneousplatformsdoexist,themovementofcodeisbasedontheprinciplesdevelopedand serverfunctionalityisdynamicallymovedbetweenmachinesishidden.sinceportabilityisanimportant DynamO'sapplicationinterfaceissimilartoExodus.DynamOoersvariable-sizedstorageobjectsconsisting 4.3ApplicationInterface inthescopeoftransactions(however,notdescribedinthispaper).fortheapplication,thefactthatstorage ofastorageidentierandanunstructuredbytecontainer.storageobjectscanbeclustered,andarechanged

DynamOwiththetraditionalclient-servermodel.Thesethreeapplicationstothepersistentstoragesystemaredata/computationintensivepersistentprogrammingapplications(IPP),databaseapplications,and NumberofClients I/O CPUCycles(inmillion)High(1000)Medium(20100)Constant(20=s) CacheHitRate CacheCoherenceCostHigh(1K10K)Medium(10100)Constant(50=s200=s) Low(10%) Low(110) High(2040) High(40%) Database Verylow(<3%) Low(110) Multimedia Inthissection,wehavechosenthreetypicalapplicationsastestbedsforaperformancecomparisonof 5BenchmarkAnalysisandComparison multimediaapplications.theestimatedcharacteristicsofthesethreeapplicationsareillustratedintable1. andonecoordinator/serverinthedynamomodel.moreover,weassumethatnodatawillbecachedonthe clientlocaldisks.inaddition,inthetraditionalclientserverarchitecture,weassumethattheserveralways gothroughthepcibusagainontothenetworkonthewaytotheclient.ontheotherhand,whentheclient hastofetchthedatafromitsdisksthroughitspcibusintoitsmainmemoryifthedataisnotcached,then Inordertofairlycomparethesetwomodels,weassumethatthereisoneserverintheclient-servermodel Table1:CharacteristicsofThreeApplications None requestsdata,ifthedataisnotcached,theclientwillsendarequesttothenetworkattacheddisks,inturn onepci-buswithsustainedbandwidthof80mb/sec.inaddition,wedonotconsiderthatdisksarethe arethesame(servermachineandclientmachines).eachmachineisequippedwitha100mipscpuand bottleneck. 5.1Data/ComputationIntensivePersistentProgramming thediskcontrollerfetchesdataintoitsbuerthensendsittothenetwork,andthedataowsthroughthe PCIbusontheclientmachineandtotheclient'smainmemory.Inthiscomparison,weassumeallmachines thecpuloadisontheclient,theserverprocessorsstillspendsasignicanttimeperclientjob(approximately theaveragetimeittakesaclienttofetchdataandprocessit. ThepathlengthofatypicalIPPapplicationconsistsofasequenceofdatafetchinganddataprocessing processesthefetcheddata.thentheprocedurerepeats.weareinterestedintheaverageresponsetime,i.e., relevantdatatoitsmainmemorywhilethedataprocessingphaseisthetimeintervalduringwhichtheclient phasesasshowninfigure8.adatafetchingphaseisthetimeintervalduringwhichtheclientfetchesthe InIPPapplications,thenumberofI/OsandtheCPUconsumptionareveryhigh.Althoughthemajorityof Figure8:MeasurementofAverageResponseTime servermodelincreasessignicantly.theaveragei/operclientis3,000,andtheaveragenumberofcpu instructionsis1000millioninstructionsand20%oftheseareexecutedontheserverandtheremainder sinceitonlyperformstheroleofcoordinatorwhiletheclientexecutes1050millioninstructions. saturationaround4clients.however,indynamo,theserver/coordinatoronlyexecutes1millioninstructions isexecutedontheclientinaclientservermodel.fromthisgure,itisclearthattheserver'scpureach bustakeslessthanhalfsecond. theoverallcpuincreases,andoncetheserverbusbecomescongested,theaverageresponsetimeintheclient- 2secondsforeachclientjobonaverage).AlthoughtheI/Orequestrateisveryhigh,onaveragethePCI tively.sincewesetthenumberofclienttobe6inthistwocase,theserver'scpuisalwayssaturated.thus, Figure9(b)and(c)showtheperformanceasafunctionofCPUconsumptionandI/Orequests,respec- Figure9(a)showstheeectofanincreaseinthenumberofclients.Sincethenumberofsessionsincreases, Data Fetching Response Time Data Process Data Fetching Response Time Data Process... Data Fetching Response Time Data Process...

Average Response Time (sec.) 30 DynamO Client Server Model 25 20 15 10 1 2 3 4 5 6 7 8 Number of Clients (a) DynamO 35 Client Server Model DynamO Client Server Model 32 30 whenthecpuworkloadisincreased,theaverageresponsetimeoftheclientservermodelincreasesatamuch Figure9:PerformanceofData/ComputationIntensivePersistentProgramming 28 Average Response 25 Time with 6 26 clients (sec.) 24 20 22 15 20 18 10 400 600 800 1000 1200 1400 1600 1000 2000 3000 4000 5000 cachedataforotherclients).weassumethatintheclientservermodel,thecachehitratiois40%whilethe similarpaceasthatofdynamobecausetheservermachinepci-busisnotthebottleneck. fasterpacethanthatofdynamo.becausethebottleneckisincpuratherthani/o.ontheotherhand,as thenumberofi/osincreases(asinfigure9(c)),theperformanceoftheclientservermodeldegradesata Average Cycles a Average Number Processing Phase a Fetching Phase (million) Inturn,performanceisimpactedwhenthenumberofclientsincreases(scalability)ortheworkloadincreases. cachemanagement,transactionmanagement,i/oservice,etc.,theserverprocessorcanbecomesaturated. 5.2DatabaseApplications Inthisenvironment,therearemanyclients;ontheorderofdozens.Thecachehitratecanbeveryhigh indynamo,sincetherearemoremachinesthatcanserveascachemanagers(i.e.,eachclientmachinecan cachehitrateisabout50%fordynamo.moreover,sincethereissignicantworkdoneontheserver,e.g., (b) (c) ThisisillustratedinFigure10. executedontheserverwhiletheclientexecutes42millioninstructionspersession. cyclesinasessiontobe40million.amongthesecycles,35%isexecutedontheserver'sprocessorwhilethe restisexecutedontheclientsintheclientservermodel.indynamo,1millioninstructionspersessionare whentheaveragenumberofinstructionspersessionincreases,theperformanceoftheclientservermodelis impactedseverely.ontheotherhand,figure10(c)showsthatasthenumberofi/opersessionincreases, Figure10(a)showstherelativescalabilityofthetwomodels.WechoosetheaveragenumberofCPU Inthistypeofenvironment,theserverCPUisthebottleneck.Asaresult,Figure10(b)showsthat Average Response Time with 6 clients (sec.)

6 DynamO Client Server Model 5 Average Response Time (sec.) 4 3 2 1 10 20 30 40 50 Number of Clients (a) 4 DynamO Client Server Model 6.00 5.50 DynamO Client Server Model 5.00 3.5 4.50 Average 3 Response 4.00 theaverageresponsetimeofanapplicationintheclientservermodelincreasesatasimilarpaceasthe Time with 30 clients 3.50 increaseindynamo. 2.5 (sec.) 3.00 5.3MultimediaApplications 2 2.50 Multimediaapplicationsareaspecialclassofapplication.Theyrequireaconstantstreamofdata.For 2.00 example,mpeg-1videorequires1.5mbits/secwhilempeg-2videorequires4-6mbits/sec.thedata 1.5 1.50 1 1.00 20 40 60 80 100 20 40 60 80 100 Average CPU Cycles a Average Number of I/Os aspeciedtimeperiod.therefore,insteadofshowingtheaverageresponsetimeofanapplication,weshow Data Processing Phase a Data Fetching Phase (million) Sincemultimediaapplicationsaredataintensiveandthenetworkattachedstoragehasmuchhigheraggregate becausethereisnowrite-back.therefore,multimediaisani/ointensivebenchmark. decodingisdoneonclientmachinesandthereisnoneedfortransactionmanagementandcoherencycontrol thenumberofmultimediastreamsthatcanbeservedbyclientserverandmodelanddynamointable11. Multimediaapplicationsarerealtimeapplications,whichrequirethatarequesthastobeservedwithin (b) (c) bandwidththanthatofclientservermodel,thedynamocanaccommodatemuchmorestreamsthanthe intheclientservermodel. clientservermodel.,e.g.,thedynamocanservearound250mpeg-2streamscomparingtoabout50streams Average Response Time with 30 clients (sec.) Figure10:DatabaseeApplications

1000 DynamO Client Server Model 900 800 700 Maximum 600 Multimedia 500 wellasnetwork-attachedstoragedevices,thereisalsoaneedtoreconsidersoftwaresystemarchitectures. Withnewdevelopmentsincomputerhardwaresuchasimprovedprocessorspeedandnetworkbandwith,as 6ConclusionandFutureWork Clients Inthispaper,weintroducedDynamO(DynamicObjectswithPersistentObjects),analternatemodelto 400 300 scalabilityandperformance.insteadofmanagingalesystembuerontheservermachine,dynamo downloadsmostserverfunctionalitytoclients,andalsotransferdatadirectlyfromnetwork-attacheddisks theclient-serverarchitectureforcomputation/dataintensiveapplicationsthatoerssignicantlyimproved Figure11:MultimediaApplications 200 toclientmachines,thus,eliminatingtheserverbottleneck. 100 length,dynamohasbetteradaptabilitybecauseitdynamicallychangesthe\client/server"computepower 0 ratioautomaticallyaccordingtotheworkload.moreover,anaddedclientmachinecannotonlysharethe 2.00 4.00 6.00 8.00 10.00 \server"workload,butthe\client"workloadaswell.therefore,betterscalabilitycanbeachieved.although WestudiedtheperformanceoftheDynamOarchitecture.Withtemporalvariationofapplicationpath Average Bit Rate Each Stream (MBit/s) DynamO. DynamOhasextracachecoherenceoverhead,thepercentageoverheadislow.WebelievethatDynamO References ChanneldisksdrivesandFebrechanneladaptorsforSunworkstationsthatweuseasthehardwarebasisfor inc++onsunultrasparcsusingsolaris.wehaveinstalledbrechanneldevices,including4segatefibre providesamorecosteective,scalable,andskewinsensitivesolutionthanthetraditionalclient-server [And96]T.E.Anderson,M.D.Dahlin,J.M.Neefe,D.A.Patterson,andothers.Serverlessnetworkle architecture. ImplementationforDynamOiscurrentlyunderwayatUCLA'sDataMiningLab.DynamOisimplemented [Arp97]A.C.Arpaci-Dusseau,R.H.Arpaci-Dusseau,D.E.Culler,J.M.Hellerstein,andD.A.Patterson. [Blo96]R.Bloor.TheComingoftheThinClient.DatabaseandNetworkJournal.vol.26,(no.4):2-4,August systems.acmtransactionsoncomputersystems,feb.1996,vol.14,(no.1):41-79. 1996. ConferenceonManagementofData,pp.243-254,May1997. High-PerformanceSortingonNetworkofWorkstations.ProceedingsACMSIGMODInternational

[Car86]Carey,M.J.,DeWitt,D.J.,Richardson,J.E.andShekita,E.J.,ObjectandFileManagementinthe [Dias89]Dias,D.M.,Balakrishna,R.I.Robinson,J.T.,Yu,P.S.,IntegratedConcurrency-CoherencyControls [Fab97]F.Fabbrocino,E.C.Shek,andR.R.Muntz.TheDesignandImplementationoftheConquestQuery [FCA]FibreChannelAssociation.http://www.amdahl.com/ext/CARP/FCA/FCA.html. EXODUSExtensibleDatabaseSystem.TwelfthInternationalConferenceonVeryLargeDatabases, 1986. [Gib97]G.A.Gibson,D.F.Nagle,K.Amiri,F.W.Chang.FileServerScalingWithNetwork-attachedSecure ExecutionEnvironment.UCLACSDTechnicalReport#970029,July1997. formultisystemdatasharing.ieeetransactionsofsoftwareengineering,vol.15,no.4,april1989. [Gro96]E.GrochowskiandR.F.Hoyt.FutureTrendsinHardDiskDrives.IEEETransactionsonMagnetics, Disks.PerformanceEvaluationReview,June1997,vol.25,(no.1):272-284. [Mos88]Moss,J.EliotB.andSinofsky,S.,ManagingPersistentDatawithMneme:DesigningaReliable [Hei95]J.HeidemannandG.Popek.PerformanceofCacheCoherenceinStackableFiling.Proceedingsofthe [Hei88]P.HeidelbergerandM.S.Lakshmi.APerformanceComparisonofMultimicroandMainframe SharedObjectInterface,AdvancesInObject-OrientedDatabaseSystems:SecondInt.Workshopon DatabaseArchitectures.IEEETransactiononSoftwareEngineering,vol.14,No.4,April1988. FifteenthACMSymposiumonOperatingSystemsPrinciples,pp.127-142,December1995. May1996,vol.32,(no.3,pt.2):1850-1854. [Nit96]S.NittelandK.R.Dittrich,AStorageServerfortheEcientSupportofComplexObjects. [Pat88]D.A.Patterson,G.A.Gibson,R.H.Katz,TheCaseforRedundantArraysofInexpensiveDisks ProceedingsPOS-7InternationalWorkshoponPersistentObjectSystems,CapeMay,June1996. OODBS,BadM}unster,Germany,1988. [Tan95]T.TannenbaumandM.Litzkow.TheCondorDistributedProcessingSystem.Dr.Dobb'sJournal, [Sun97]JavaBeans.http://java.sun.com/beans.SunMicrosystems. [She97]E.Shek,R.R.Muntz,andL.Fillion.TheDesignoftheFALCONFrameworkforApplicationLevel November1996. CommunicationOptimization.TechnicalReportNo.960039,ComputerScienceDepartment,UCLA, (RAID)ProceedingsACMSIGMODInternationalConferenceonManagementofData,pp.109-116, February1995. May1988.