CapNet: A Real-Time Wireless Maagemet Network for Data Ceter Power Cappig Abusayeed Saifullah,, Sriram Sakar, Jie Liu, Cheyag Lu, Raveer Chadra, ad Bodhi Priyatha Washigto Uiversity i St louis, Missouri 633, USA (Curretly with Missouri Uiversity of Sc. & Tech. ) Microsoft Corporatio, Redmod, Washigto 9852, USA Microsoft Research, Redmod, Washigto 9852, USA Abstract Data ceter maagemet (DCM) is icreasigly becomig a sigificat challege for eterprises hostig large scale olie ad cloud services. Machies eed to be moitored, ad the scale of operatios madates a automated maagemet with high reliability ad real-time performace. Existig wired etworkig solutios for DCM come with high cost. I this paper, we propose a wireless sesor etwork as a cost-effective etworkig solutio for DCM while satisfyig the reliability ad latecy performace requiremets of DCM. We have developed CapNet, a real-time wireless sesor etwork for power cappig, a time-critical DCM fuctio for power maagemet i a cluster of servers. CapNet employs a efficiet evet-drive protocol that triggers data collectio oly upo the detectio of a potetial power cappig evet. We deploy ad evaluate CapNet i a data ceter. Usig server power traces, our experimetal results o a cluster of 48 servers iside the data ceter show that CapNet ca meet the real-time requiremets of power cappig. CapNet demostrates the feasibility ad efficacy of wireless sesor etworks for time-critical DCM applicatios. I. INTRODUCTION The cotiuous, low-cost, ad efficiet operatio of a dataceter heavily depeds o its maagemet etwork ad system. A typical data ceter maagemet (DCM) system hadles physical layer fuctioality such as powerig o/off a server, motherboard sesor telemetry, coolig maagemet, ad power maagemet. Higher level maagemet capabilities such as system re-imagig, etwork cofiguratio, (virtual) machie assigmets, ad server health moitorig [], [2] deped o DCM to work correctly. DCM is expected to fuctio eve whe the servers do ot have a workig OS or the data etwork is ot cofigured correctly [3]. Today s DCM is typically desiged i parallel to the productio data etwork (i other words, out of bad), with a combiatio of Etheret ad serial coectios for icreased redudacy. There is a cluster cotroller for a rack or a group of racks, which are coected through Etheret to a cetral maagemet server. Withi the clusters, each server has a motherboard microcotroller (BMC - Baseboard Maagemet Cotroller) that is coected to the cluster cotroller via poitto-poit serial coectios. For redudacy reasos, every server is typically coected to two idepedet cotrollers o two differet fault domais, so there is at least oe way to reach the server uder ay sigle poit of failure. Ufortuately, this architecture does ot scale. The overall cost of maagemet etwork icreases super-liearly with the umber of servers i a data ceter. At the same time, massive cablig across racks icreases the chace for huma errors ad prologs the server deploymet latecy. This paper presets a differet approach to data ceter maagemet etwork at the rack graularity, by replacig serial cable coectios with low cost wireless liks. Low power wireless sesor etwork techology such as IEEE 82.5.4 has itrisic advatages i this applicatio. Cost: Low-power radios (i.e., IEEE 82.5.4) are cheaper idividually tha wired alteratives ad the cost scales liearly with the umber of servers. Embedded: These radios ca be physically small ad be itegrated oto motherboard to save precious rack space. Recofigurability: Wireless sesor etworks ca be selfcofigurig ad self-repairig with the broadcast media to prevet huma cablig error. Low power: With a small o-board battery, the DCM based o wireless ca cotiue to fuctio o batteries providig moitorig capabilities eve whe the rack experieces a power supply failure. However, whether a wireless DCM ca meet the high reliability requiremet for data ceter operatio is ot obvious for several reasos. The amout of sheet metals, electroics, ad cables may completely shield RF sigal propagatio withi racks. Furthermore, although typical traffic o a DCM is low, emergecy situatios might eed to be hadled i real time, which could require the desig of ew protocols. Power cappig is a example of emergecy evet that imposes real-time requiremets. Today, data ceter operators commoly oversubscribe the power ifrastructure by istallig more servers to a electric circuit tha it is rated. The ratioale is that servers seldom reach their peak at the same time. By over-subscriptio, the same data ceter ifrastructure ca host more servers tha otherwise. I the rare evet whe the aggregate power cosumptio of all servers exceeds the circuit s power capacity, some servers must be slowed dow (i.e. power capped), through dyamic frequecy ad voltage scalig (DVFS) or CPU throttlig, to prevet the circuit breaker from trippig. Every magitude of oversubscriptio is associated with a trip time which is a deadlie by which power cappig must be performed to avoid circuit breaker trippig. This paper studies the feasibility ad advatages of usig low-power wireless for DCM. I two data ceters, we empirically evaluate IEEE 82.5.4 lik qualities i server racks to
show that the overall packet receptio rate is high. We further dive ito the power cappig sceario ad desig CapNet, a wireless Network for power Cappig, that employs a evet-drive real-time cotrol protocol for power cappig over wireless DCM. The protocol uses distributed evet detectio to reduce the overhead of regularly pollig all odes i the etwork. Hece, the etwork throughput ca be used by other maagemet tasks whe there is o emergecy. Whe a potetial power surge is detected, the cotroller uses a slidig widow ad collisio avoidace approach to gather power measuremets from all servers, ad the issues power cappig commads to a subset of them. We deployed ad evaluated CapNet i a data ceter. Usig server power traces, our experimetal results o a cluster of 48 servers i the data ceter show that CapNet ca meet the real-time requiremets of power cappig. It demostrates the feasibility ad efficacy i power cappig like wired DCM with a fractio of the cost.. II. THE CASE FOR WIRELESS DCM (CAPNET) Typical wired DCM solutios i data ceters scale poorly with icrease i umber of servers. The serial-lie based poitto-poit topology icurs additioal costs as we coect more of them together. Here, we compare the costs of the wired DCM to our proposed wireless based solutio (CapNet) by cosiderig the cost of the maagemet etwork, ad by measurig the quality of i-rack wireless liks. A. Cost Compariso with Wired DCM To compare the hardware cost, we cosider the cost of the DiGi switches ($397/48port [4]), cotroller cost (approx. $5/rack [5]), cable cost ($2/cable [6]) ad additioal maagemet etwork switches ($3/48port o average [7]). We do ot iclude the labor or maagemet costs for cablig for simplicity of costig model, but ote that these costs are also sigificat with wired DCMs. We assume that there are 48 servers per rack, ad there ca be up to, servers that eed to be maaged, which are typical for large data ceters. For the wireless DCM based CapNet solutio, we assume IEEE 82.5.4 (ZigBee) techologies for its low cost beefits. The cost of etwork switches at the top level layer stays, but the cost of DiGi ca be sigificatly reduced. We assume $ per wireless cotroller, which is essetially a Etheret to ZigBee relay. For wireless receivers o the motherboard, we assume $5 per server for the RF chip ad atea as the motherboard cotroller is already i place [8]. # of servers Wired-N Wired-2N CapNet-N CapNet-2N 745 49 36 67 656 332 353 656 9882 9764 82 42 9878 9656 799 88 977228 954456 7884 6368 TABLE I SYSTEM COST (IN US DOLLAR) COMPARISON AND SCALABILITY We develop a simple cost model based o these idividual costs ad compute the total devices eeded for implemetig maagemet over umber of servers ragig from to Fig.. Mote placed i bottom sled, (i order to capture how cost scales with the umber of servers). We cosider solutios across two dimesios ) Wired vs Wireless, ad 2) N-redudat vs 2N-redudat (A 2N redudat system cosists of two idepedet switches, DiGis ad paths through the maagemet system). Table I shows the cost compariso across these solutios. We see that a wired N-redudat DCM solutio (Wired-N) for, servers is 2.5 the cost of a wireless N-redudat DCM solutio (CapNet-N). If we icrease the redudacy of the maagemet etwork to 2N, the cost of a wired solutio (betwee Wired- 2N ad Wired-N) doubles. I cotrast, the cost of a wireless solutio icreases oly by 36% (due to 2N cotrollers ad 2N switches at the top level). The resultig cost of Wired- 2N is 8.4 that of CapNet-2N. Give the sigificat cost differece betwee wired DCM ad CapNet, we ext explore whether wireless is feasible for commuicatio withi racks. B. Choice of Wireless - IEEE 82.5.4 We are particularly iterested i low badwidth wireless like IEEE 82.5.4 istead of IEEE 82. for a umber of reasos. First, the payload size for data ceter maagemet is small ad hece a ZigBee (IEEE 82.5.4) etwork badwidth is sufficiet for cotrol plae traffic. Secod, i WiFi (IEEE 82.) there is a limit o how may odes a access poit ca support i the ifrastructure mode sice it has to maitai a IP stack for every coectio, ad this impacts scalability i a dese deploymet. Third, to support maagemet features, the data ceter maagemet system should still work whe the rack is upowered. A small backup battery ca power ZigBee loger at much higher eergy efficiecy. Fially, ZigBee commuicatio stack is simpler tha WiFi so the motherboard (BMC cotroller) microcotroller ca remai simple. Although we do ot rule out other wireless techologies, we chose to prototype with ZigBee i this paper. C. Radio Eviromet iside Racks We did ot fid ay previous study that evaluated the sigal stregth withi the racks through servers ad sheet metal. The sheet metals iside the eclosure are kow to weake radio sigal, givig a harsh eviromet for radio propagatio iside racks. RACNet [9] studied wireless characteristics i data ceters, but oly across racks whe all radios are mouted
CDF of RSSI.8.6.4.2 dbm 3dBm 7dBm 5dBm 8 75 7 65 6 55 RSSI (dbm) (a) RSSI whe Tx power varies (chael 26) Fig. 2. CDF of RSSI.8.6.4 Chael 26.2 Chael 2 Chael 5 Chael 9 8 7 6 5 4 3 RSSI (dbm) (b) RSSI o various chaels (Tx power -3dBm) Dowward sigal stregth ad PRR i bottom sled PRR (%) 8 6 4 2 dbm 3dBm 7dBm 5dBm 26 2 5 Chael (c) PRR at a receiver at the top of the rack. Therefore, we first perform a i-depth 82.5.4 lik layer measuremet study based o i-rack radio propagatio iside a data ceter of Microsoft Corporatio. Setup. The data ceter used for measuremet study has racks that cosist of multiple chassis i which servers are housed. A chassis is orgaized ito two colums of sleds. I all experimets, oe TelosB mote is placed o top of the rack (ToR), iside the rack eclosure. The other motes are placed i differet places i a chassis i differet experimets. Figure shows the placemet of 8 motes iside a bottom sled (which is ope i the figure but was closed durig the experimet). While measurig the dowward lik quality, the ode o ToR is the seder ad the odes i the chassis receive. The we reverse the seder ad the receiver to measure the upward lik quality. I each setup, the seder trasmits packets at 4Hz. The payload size of each packet is 29 bytes. Through a week-log test capturig the log-term variability of liks, we collected sigal stregths ad packet receptio rate (PRR). Results. Figure 2(a) shows the cumulative distributio fuctio (CDF) of Received Sigal Stregth Idicator (RSSI) values at a receiver iside the bottom sled for trasmissios from the ode o ToR for differet trasmissio (Tx) power usig IEEE 82.5.4 chael 26. For -7dBm or higher Tx power, RSSI is greater tha -7dBm i % cases. RSSI values i ZigBee receivers are i the rage [, ]. Previous study [] o ZigBee shows that whe the RSSI is above 87dBm (approx.), PRR is at least 85%. As a result, we see that sigal stregth at the receiver i bottom sled is quite strog. Figure 2(b) shows the CDF of RSSI values at the same receiver for trasmissios from the ode o ToR o differet chaels at Tx power of -3dBm. Both figures idicate a strog sigal stregth, ad i each experimet the PRR was at least 94% (Figure 2(c)). We observed similar results i all other setups of the measuremet study, ad omit those results. The measuremet study reveals that low-power wireless, such as IEEE 82.5.4, is viable for commuicatio withi data ceter racks ad ca be reliable for telemetry purpose. We ow focus o the power cappig sceario ad CapNet desig for real-time power cappig over wireless DCM. III. CAPNET DESIGN OVERVIEW Power ifrastructure bears huge capital ivestmet for a data ceter, up to 4% of the total cost of a large data ceter that ca cost hudreds of millios of US Dollars []. Hece, it is desirable to use the provisioed ifrastructure to Not Tripped Log-delay Covetioal Trippig Short Circuit Tripped Fig. 3. The trip curve of Rockwell Alle-Bradley 489-A circuit breaker at 4 C [6]. X-axis is oversubscriptio magitude. Y-axis is trip time. its maximum rated capacity. The capacity of a brach circuit is provisioed durig desig time, based o upstream trasformer capacity durig ormal operatio or UPS/Geerator capacity whe ruig o backup power. To improve data ceter utilizatio, a commo practice i eterprise data ceters is to do oversubscriptio [2] [5]. This method allocates servers i a circuit exceedig the rated capacity (i.e. cap), sice ot all servers reach their maximum power cosumptio at the same time. Hece, there is a circuit breaker (CB) that trips to protect expesive equipmet. The peak power cosumptio above the cap has a specified time limit, called a trip time, depedig o the magitude of over-subscriptio (as show i Figure 3 for Rockwell Alle-Bradley 489-A circuit breaker). If the over-subscriptio cotiues for loger tha the trip time, the CB will trip ad cause udesired server shutdows ad power outages disruptig data ceter operatio. Power cappig is the mechaism to brig the aggregate power cosumptio back to the cap. A overload coditio uder practical curret draw trips the CB o a time scale from several hudred millisecods to hours, depedig o the magitude of the overload [6]. These trip times are the deadlies for the correspodig oversubscriptio magitudes withi which power cappig must be doe to prevet CB trippig to avoid power loss or damage to expesive equipmet. A. The Power Cappig Problem To eable power cappig for a rack or cluster, a power cappig maager (also called cotroller) collects all servers power cosumptio ad determies the cluster-level aggregate
A cluster of 3 racks ToR ToR Wireless power cappig maager ToR the power cosumptio readigs from idividual servers. The maager periodically computes the aggregate power. Wheever the aggregate power exceeds the cap, it geerates a cotrol message. Upo fiishig the aggregatio ad cotrol i η iteratios, it resumes the periodic aggregatio agai. Rows of racks i data ceter Fig. 4. A cluster of 3 racks Wireless DCM architecture Server with wireless power cosumptio. If the aggregate cosumptio is over the cap, the maager geerates cotrol messages askig a subset of the servers to reduce their power cosumptios through CPU frequecy modulatio (ad voltage if usig DVFS) or utilizatio throttlig. The applicatio level quality of service may require differet servers to be capped at differet levels. So the cetral cotroller eeds all idividual server readigs. I some graceful throttlig policies, the cotrol messages are delivered by the BMC Cotroller to the host OS or VMs, which itroduce additioal latecy due to OS stack [4], [7]. To avoid abrupt chages to applicatio performace, the cotroller may chage the power cosumptio icremetally ad require multiple iteratios of the feedback cotrol loop before the cluster settles dow to below the power cap [4], [8]. These cotrol policies have bee studied extesively by previous work ad are out of the scope of this paper. B. Power Cappig over Wireless DCM Servers i a data ceter are stacked ad orgaized ito racks. Oe or more racks ca comprise a power maagemet uit, called a cluster. Figure 4 shows the wireless DCM architecture iside a data ceter. All servers i a cluster icorporate a wireless trasceiver that coects to the BMC microcotroller. Each server is capable of measurig its ow power cosumptio. A cluster power cappig maager ca either directly measure the total power cosumptio usig a power meter, or, to achieve fie-graied power cotrol, aggregates the power cosumptio from idividual servers. We focus o the secod case due to its flexibility. Whe the aggregate power cosumptio approaches the circuit capacity, the maager issues cappig commads over wireless liks to idividual servers. The mai differece compared to a wired DCM is the broadcast wireless media ad challege of schedulig commuicatio to meet the real-time demads. To reduce extra coordiatio ad to eable spatial spectrum reuse, we assume a sigle IEEE 82.5.4 chael for commuicatio iside a cluster. Usig multiple chaels, multiple clusters ca ru i parallel. Chael allocatio ca be doe usig existig protocols that miimize iter-cluster iterferece (e.g. [9]), ad is ot the focus of our paper. For protocol desig, we focus o a sigle cluster of servers. C. A Naive Periodic Protocol A aive approach for a fie-graied power cappig policy is to always moitor the servers by periodically collectig D. Evet-Drive CapNet Oversubscribig data ceters may provisio for the 95-th (or more) percetile of the peak power, ad require cappig for 5% (or less) of the time, which may be a acceptable hit o performace i relatio to cost savigs [7]. Thus power cappig is a rare evet, ad the aive periodic protocol is a overkill as it saturates the wireless media by always preparig for the worst case. Other delay-tolerat telemetry messages caot get eough etwork resources. A ideal wireless protocol should geerate sigificat traffic oly whe a sigificat power surge occurs. Therefore, CapNet employs a evet-drive policy that is desiged to trigger power cappig cotrol operatio oly whe a potetial power cappig evet is predicted. Due to the rareess ad emergecy ature of power surge, the etwork ca susped other activities to hadle power cappig. It provides real-time performace ad a sustaiable degree of reliability without cosumig much etwork resource. The details of the protocol is explaied i the ext sectio. IV. POWER CAPPING PROTOCOL We desig a distributed evet detectio policy, where we assig local caps to each idividual server from their global (cluster-level) cap. Whe a server observes a local power surge based o its ow power readig, it ca trigger the collectio of the power cosumptio of all the servers to detect a potetial surge i the aggregate power cosumptio of cluster. If a cluster-level power surge is detected, the system iitiates a power cappig actio. As may servers ca simultaeously exceed their local caps, a stadard CSMA/CA protocol ca suffer from sigificat packet loss due to excessive cotetio ad collisios. Similarly, a slot stealig TDMA (Time Divisio Multiple Access) protocol such as Z-MAC [2] would suffer from the same problem as those servers will try to steal slot simultaeously. Furthermore, pure TDMA based protocols do ot fit well for our problem sice they eed to have a predefied commuicatio schedule for all odes. Fially, as power aggregate cosumptio ca be quite dyamic, it may be ifeasible to predict a upcomig power peak based o historical readigs. This observatio leads us to avoid a predictive protocol that proactively schedule data collectio based o historical power readigs. While a global detectio is possible by just moitorig at the brach circuit level, say usig a power meter, it caot support fie-graied ad flexible power cappig policies such as those based o idividual server-priority or reducig powers of idividual servers based o their power cosumptios. Also, a cetralized measuremet itroduces a sigle poit of failure. That is, if the power meter fails, power oversubscriptio will fail also. I cotrast, our distributed approach is more resiliet to failure. If idividual measuremet fails, the system ca
o Start Each server is give a uique ID i=,..., Detectio phase The maager seds heartbeat after each detectio iterval Cod () satisfied i this iterval? server i seds alarm at i-th slot of detectio iterval, if pi > local cap Fig. 5. yes Aggregatio phase collect all server readigs k=; false alarm o p agg > c yes Cotrol phase cotrol(); k++; CapNet s evet-drive protocol flow diagram always assume a maximum power cosumptio at that server ad keep the whole cluster goig. The evet-drive protocol rus i 3 phases as illustrated i Figure 5: detectio, aggregatio, ad cotrol. The evet detectio phase geerates alarms based o local power surges. Upo detectig a potetial evet, CapNet rus the secod phase which ivokes a power aggregatio protocol. False detectio may happe whe some servers geerate alarms exceedig the local caps, but the aggregate value is still uder the cap. This is corrected i the aggregatio phase, where the cotroller determies the aggregate power cosumptio. The impact of a false positive case is that the system rus ito the aggregatio phase which icurs additioal wireless traffic. The cotrol phase is executed oly if the alarms are true. We ormalize each server s power cosumptio value betwee ad by dividig its istataeous power cosumptio by the maximum power cosumptio of a idividual server. This ormalized power cosumptio value of server i is deoted by p i, where p i, ad is used i this paper as a server s power cosumptio. The cap of a cluster of servers is deoted by c, ad the total power cosumptio of servers is cosidered as the aggregate power cosumptio ad is deoted by p agg. Assigig local cap. If p agg > c, a ecessary coditio is that some servers (at least oe) idividual power cosumptio values locally exceed the value c. Therefore, a possible way is to assig c as each server s local cap. However, there ca be situatios where oly oe server exceeds c while all other servers are uder c, thereby triggerig a aggregatio phase upo a sigle server s alarm. As a result, this policy will geerate may false alarms. Therefore, to suppress false alarms, we assig a slightly smaller local cap, ad cosider alarms from multiple servers before aggregatio phase. Thus we use a value < α close to ad assig αc as the local cap for each server. A server i reports alarm if p i > αc. Each server is assiged a uique ID i, where i =, 2,,. The maager broadcasts a heartbeat packet at every h time uits called detectio iterval. The detectio iterval of legth h is slotted amog slots, with each slot h legth beig. The value of h is selected i a way so that a slot is log eough to accommodate oe trasmissio ad its ackowledgemet. After receivig the heartbeat message, the server clocks are sychroized. o k < η yes A. Detectio Phase Each ode i, i, takes its sample (i.e., power cosumptio value p i ) at the i-th slot i the detectio phase. If its readig is over the cap i.e. p i > αc, it geerates a alarm ad seds the readig (p i ) to the maager as a ackowledgemet of the heartbeat message. Otherwise, it igores the heartbeat message, ad does othig. If a alarm is received at the s-th slot, the maager determies, based o whether the etwork is reliable or ot, whether a aggregatio phase has to be started. Let the servers who have set alarms i the curret detectio widow so far be deoted by A. Reliable Network. Let a alarm be geerated i the s-th slot of a detectio iterval. Cosiderig a reliable etwork we ca cosider that o server message was lost. Therefore, each of the other s A servers amog the first s servers has a power cosumptio readig of at most αc as it has ot geerated a alarm. Each of the remaiig s servers ca have a power cosumptio value of at most. Thus based o the alarm at s-th slot, the maager ca estimate a aggregate power of j A p j+(s A ) αc +( s). Hece, if a alarm is geerated at the s-th slot, the maager will start aggregatio phase if p j + (s A ) αc + ( s) > c () j A Ureliable Network. Now we cosider a sceario where some server alarms were lost. As a result, if a alarm is geerated i the s-th slot of a detectio widow, each of the other s A servers amog the first s servers may have a power cosumptio readig of at most as its alarm is assumed to be lost. Therefore, each of the A servers ca have power cosumptio of at most, makig a estimated aggregate power of j A p j + ( A ). Thus, if a alarm is geerated i the s-th slot, the maager will start aggregatio phase if p j + ( A ) > c (2) j A If there are o alarms i the detectio phase or all alarm messages were lost due to trasmissio failure, the cotroller resumes the ext detectio phase (to detect the surges agai usig the same mechaism) whe the curret phase is over. B. Aggregatio Phase To miimize aggregatio latecy, CapNet adopts a slidig widow based protocol to determie aggregate power cosumptio deoted by p agg. The cotroller uses a widow of size ω. At aytime, it selects ω servers (or, if there are fewer tha ω servers whose readigs are ot yet collected, the selects all of them) i a roud-robi fashio who will sed their readigs cosecutively i the ext widow. These ω server IDs are ordered i a message. I the begiig of the widow, the cotroller broadcasts this message, ad starts a timer of legth τ d + ωτ u after the broadcast, where τ d deotes the maximum dowward commuicatio time (i.e., the maximum time required for a cotroller s packet to be delivered to a server) ad τ u deotes the maximum upward
commuicatio time (server to cotroller). Upo receivig the broadcast message, ay server whose ID is i order i, i ω, i the message trasmits its readig after (i )τ u time. Other servers igore the message. If the timer fires or packets from all ω odes are received, the cotroller creates the ext widow of ω servers that are yet to be scheduled or whose packets were missed (i the previous widow). A server is scheduled i at most γ cosecutive widows to hadle trasmissio failures, where γ is the worst-case ETX (expected umber of trasmissios for a successful delivery) i the etwork. The procedure cotiues util all server readigs are collected or there is o server that was retried γ times. C. Cotrol Phase Upo fiishig the aggregatio phase, if p agg > c, where c is the cap, it starts the cotrol phase. The cotrol phase geerates a cappig cotrol commad usig a cotrol algorithm, ad the the cotroller broadcasts the message requestig a subset of the servers to be capped. To hadle broadcast failures, it repeats the broadcast γ times (sice the broadcast is ot ackowledged). The servers react to the cappig messages by DVFS or CPU throttlig that icurs a operatig system (OS) level latecy as well as a hardware-iduced delay [7]. If the cotrol algorithm requires η-iteratio, the after the cappig cotrol commad is executed i the first roud, the cotroller will agai ru the aggregatio phase to recofirm that cappig was doe correctly. The procedure iterates up to (η ) more iteratios. Upo fiishig the cotrol, or after the aggregatio phase upo a false alarm, it resumes the detectio phase. D. Latecy Aalysis Give the time criticality for power cappig, it is importat for CapNet to achieve bouded latecy. Here, we provide a aalytical latecy upper boud for CapNet s power cappig latecy that cosists of detectio phase latecy, aggregatio latecy, OS level latecy, ad hardware latecy. I practice, the actual latecy is usually lower tha the boud. The aalysis ca be used by system admiistrators to cofigure the cluster to esure power cappig meets the timig costraits. Aggregatio latecy. For servers i the cluster, the total aggregatio delay L agg uder o trasmissio failure ca be upper bouded as follows. Note that each widow of ω trasmissios ca take at most (τ u ω+τ d ) time uits. There ca be at most ω widows where i each widow ω servers trasmit. The, the last widow will take oly ( mod ω + τ d ) time to accommodate the remaiig ( mod ω) servers. Hece, L agg (τ u ω + τ d ) + ( mod ω + τ d ) ω Cosiderig γ as the worst-case ETX i the etwork, ( ) L agg (τ u ω + τ d ) + ( mod ω + τ d ) γ (3) ω The above value is oly a aalytical upper boud, ad i practice the latecy ca be a lot shorter. Latecy i detectio phase. The time spet i the detectio phase is deoted by L det. I a detectio widow the protocol ever will eed the readigs from the last c servers as a aggregatio phase must start before this should a power cappig eeded (assumig that ot all alarms were lost). Therefore the alarms geerated withi the first ( c + ) slots must trigger aggregatio phase. Hece, L det h ( c + ) (4) Total power cappig latecy. To hadle a power cappig evet, a detectio phase ad a aggregatio phase are followed by a cotrol message that is broadcasted γ times ad takes τ d γ time. I additio, oce the cotrol message reaches a server, there is a operatig system level latecy, ad after processor frequecy chages, there is a hardware-iduced delay. Let the OS level latecy ad the hardware level latecy i the worst case be deoted by L os ad L hw, respectively. Thus, the total power cappig latecy i oe iteratio, deoted by L cap, is bouded as L cap L det + L agg + τ d γ + L os + L hw A η-iteratio cotrol meas that oce power cappig commad is executed, the cotroller will agai eed to collect all readigs from servers, ad recofirm that cappig was doe correctly i (η ) more iteratios. Therefore, for η-iteratio cotrol, the above boud is give by L cap L det + (L agg + τ c γ + L os + L hw )η (5) V. EXPERIMENTS I this sectio, we preset the experimetal results of CapNet. The objective is to evaluate the effectiveess ad robustess of CapNet i meetig the real-time requiremets of power cappig uder data ceter realistic settigs. A. Implemetatio The wireless commuicatio side of CapNet is implemeted i NesC o TiyOS [2] platform. To comply with realistic data ceter practices, we have implemeted the cotrol maagemet at the power cappig maager side. I our curret implemetatio, wireless devices are plugged to the servers directly through their serial iterface. B. Workload Traces We use workload demad traces from multiple geodistributed data ceters ru by a global corporatio over a period of six cosecutive moths. Each cluster cosists of several hudreds of servers that spa multiple chassis ad racks. These clusters ru a variety of workloads icludig Web-Search, Email, Map-Reduce jobs, ad cloud applicatios, caterig to millios of users aroud the world. Each cluster uses homogeeous hardware, though there could be differeces across clusters. We use workload traces of 2 represetative server clusters: C ad C2. I both clusters each idividual server has CPU utilizatio data of 6 cosecutive moths i every 2 miutes iterval. While we recogize that full system power is composed of storage, memory ad other compoets, i additio to CPUs, several previous works show that a server s
utilizatio is roughly liear to its power cosumptio [22] [25]. Hece, we use server s CPU utilizatio as a proxy for power cosumptio i all experimets. C. Experimetal Setup ) Experimetal Methodology: We experimet with Cap- Net usig TelosB motes for wireless commuicatio. First we deployed 8 motes ( for maager, 8 for servers) i Microsoft s data ceter i Redmod, WA. Whe we experimet with more tha 8 servers to test scalability, oe mote emulates multiple servers ad commuicates for them. For example, whe we experimet for 48 servers, mote works for first 6 servers, the mote 2 works for ext 6 servers, ad so o. We place all 8 motes i racks. The maager ode is placed o ToR ad coected through its serial iterface to a PC that works as the maager. No mote i the rack has direct lie of sight with the maager. Usig the workload demad traces, CapNet is ru i a trace-drive fashio. For every server the readig at a time stamp set from its correspodig wireless mote is take from these traces at the same time stamp. While the data traces are of 6-moth log, our experimet does ot ru for actual 6-moth. Whe we take a subset of those traces, say for 4 weeks, the protocols skip the log time itervals where there is o peak. For example, whe we kow (lookig ahead ito the traces) there is o peak betwee time t ad t 2, the protocols skip the times betwee t ad t 2. Thus our experimets fiish i several days istead of 4 weeks. 2) Oversubscriptio ad Trip Time: We use the trip times from Figure 3 as the basis, i order to determie the differet caps required i various experimets. X-axis shows the ratio of curret draw to the rated curret ad is the magitude of oversubscriptio. Y-axis shows the correspodig trip time. The trip curve is show as a tolerace bad. The upper curve of the bad idicates upper boud (UB) trip times above which is the tripped area, meaig that the circuit breaker will trip if the duratio of the curret is loger tha the UB trip time. The lower curve of the bad idicates lower boud (LB) trip times uder which is the ot-tripped area. This bad betwee 2 curves is the area where it is o-determiistic if the circuit breaker will trip. LB trip time is a very coservative boud. I our experimets we use both LB ad UB of covetioal trip times to verify the robustess of CapNet. 3) CapNet Parameters: For all experimets, we use chael 26 ad Tx power of -3dBm. The payload size of each packet set from the server odes is 8 bytes, which is eough for sedig power cosumptio readig. The maximum payload size of each packet set from the maager is 29 bytes, the maximum default size i IEEE 82.5.4 radio stack for TelosB motes. This payload size is set large to cotai the schedules as well as cotrol iformatio. For aggregatio protocol, widow size ω is set to 8. A larger widow size ca reduce aggregatio latecy, but requires the payload size of the maager s message to be larger (sice the packet cotais ω ode IDs idicatig the schedule for ext widow). I the aggregatio protocol both τ d ad τ u were set to 25ms. The maager sets its timeout usig these values. These values are relatively larger compared Aggregate power Aggregate power 5 4 3 2 2 3 4 Time x 4 5 4 3 2 (a) Aggregate power (2 Moths) 2 3 4 Time (b) Aggregate power (st 6 days zoomed-i) Fig. 6. 6 Servers o Rack R i Cluster C to the maximum trasmissio time betwee two wireless devices. The time required for commuicatio betwee two wireless devices is i the rage of several millisecods. But i our desig the maager ode is coected through its serial iterface to a PC. The TelosB s serial iterface does ot always icur a fixed latecy for commuicatio betwee PC ad the mote through serial. Upo experimetig ad observig a wide variatio of this time, we have set τ d ad τ u to 25ms. 4) Cotrol Emulatio: I our experimets, we emulate the fial cotrol actio sice we use workload traces. We assume that oe packet is eough to cotai the etire cotrol message. To hadle cotrol broadcast failure, we repeat cotrol broadcast γ = 2 times. Our extesive measuremet study through data ceter racks idicated that this is also the maximum ETX for ay lik betwee two wireless motes. Upo receivig the cotrol broadcast message, the odes geerate a OS level latecy ad hardware level latecy. We use the maximum ad miimum OS level ad hardware level time required for power cappig experimeted o three servers with differet processors: Itel Xeo L552 (frequecy 2.27GHz, 4 cores), Itel Xeo L564 (frequecy 2.27GHz, dual socket, 2 cores with hyper-threadig), ad a AMD Optero 2373EE (frequecy 2.GHz, 8 cores with hyper-threadig), each ruig Widows Server 28 R2 [7]. The rages of OS level ad hardware level latecies are i the rage of -5ms ad - 3ms, respectively [7]. We geerate OS ad hardware level latecies usig a uiform distributio i this rage. D. Power Peak Aalysis of Data Ceters We first aalyze whether CapNet protocol is cosistet with the data ceter power behavior leveragig our data traces. For brevity, we preset the trace aalysis results of 3 racks: Racks R ad R2 from Cluster C, ad Rack R3 from Cluster C2. To give a idea o how power cosumptio varies over time i a data ceter, Figure 6(a) shows the aggregate power of 6 servers o RACK R i cluster C for 2 cosecutive moths which is zoomed i for 6 cosecutive days i Figure 6(b). For
CDF CDF.8.6.4.2 Rack R3 C2 Rack R2 C Rack R C.5.5 2 2.5 Iterval betwee 2 peaks (miutes) x 4.8.6.4 (a) Time iterval betwee 2 peaks.2 Rack R3 C2 Rack R2 C Rack R C 2 3 4 5 6 Power jump Fig. 7. (b) Power jump Power characteristics (2 moth data) each rack, we use the 95-th percetile of aggregate power over 2 cosecutive moths as the power cap. We first explore the power dyamics of the servers ad the upredictability of power cappig evets. Usig 2-moth log data, Figure 7 shows that the time itervals betwee two cosecutive peaks ca rage betwee few miutes to several hudred hours. We defie power jump as the differece betwee the power that exceeds the cap ad the precedig measuremet that is below the cap. As Figure 7(b) shows that power jumps ca vary betwee to 5 for 6 servers i each rack (while their aggregate power is i rage [, 6]). This result shows the motivatio for a evet-drive protocol. Figure 8 illustrates the correlatios across 8 servers from differet racks ad clusters usig their raw power cosumptio data over week. The image is a visualizatio of a 8 8 matrix, idexed by the server umber. That is, the etry idexed at [i, j] i this matrix is the correlatio coefficiet of the values (54 samples) betwee the i-th ad the j- th server. We ca clearly see that the servers i the same rack are strogly positively correlated, ad those i the same cluster are also positively correlated. But the servers betwee clusters are less or egatively correlated. This usually happes because the servers i the same cluster hosts similar workloads leadig to sychroous power characteristics [25]. We further c 6 assume a local cap of (cosiderig α = ) for each idividual server, ad show i Figure 8(b) the CDF of the umber of servers that exceed local caps whe the cluster level aggregate power exceeds cap c. The figure shows that i 8% cases whe the rack level aggregate power exceeds cap c, the umbers of servers (amog 6 servers per rack) that are over the local cap are 43, 55, ad 5 for Rack R3, R, ad R2, respectively. The strog itra-cluster sychroy i power surge suggests the feasibility of detectig a clusterlevel power surge based o local server-level measuremets. Figure 8(c) shows probabilities of differet racks i 2 clusters to be at peak simultaeously. The etry idexed at [i, j] i this 2D matrix is the probability that the i-th rack i cluster ad the j-th rack i cluster 2 are at peak simultaeously. The probabilities were foud i the rage [,.56]. This strog iter-cluster asychroy implies that usig a evetdrive protocol (that performs wireless commuicatios oly upo detectig a evet) sigificatly miimizes iter cluster iterferece caused by trasmissios geerated by the evetdrive CapNet i differet clusters. We observe strog sychroy i power behavior amog the servers i the same cluster ad strog asychroy amog betwee differet clusters. The major implicatio of the trace aalysis is that CapNet protocol is cosistet with real data ceter power behavior. As the itra-cluster sychroy suggests the potetial efficacy of a local evet detectio policy, our protocol is particularly effective i the presece of strog itra-cluster sychroy that exists i eterprise data ceters as observed i our trace aalysis. However, i absece of itracluster sychroy i power peaks, CapNet will ot cause uecessary power cappig cotrol or more wireless traffic tha a periodic protocol. The sychroy oly ehaces CapNet s performace. E. Power Cappig Results Now we preset our experimetal results with CapNet s evet-drive protocol. First we compare its performace with the periodic protocol ad a represetative CSMA/CA protocol. We the aalyze its scalability i terms of umber of servers. First we experimet oly for the simple case, where a sigle iteratio of cotrol loop ca settle to a sustaied power level, ad the we also aalyze scalability i terms of umber of cotrol iteratios, where multiple iteratios are eeded to settle to a sustaied power level. We have also experimeted it uder differet caps ad i presece of iterferig clusters. I all experimets, detectio phase legth, h, was set to ms, where is the umber of servers. We set this value because this makes each slot i the detectio phase equal to ms, which is eough for receivig oe alarm as well as for sedig a message from the maager to the servers. Settig a larger value reduces the umber of cycles of detectio phase, but reduces the graularity of moitorig. For assigig a local cap of αc to the servers, we first experimet with α =. Later, we experimet uder differet values of α. Coditio is used for detectio ad startig a aggregatio phase. I the results, slack is defied as the differece betwee the trip time (i.e. deadlie) ad the total latecy required for power cappig. That is, a egative value of slack implies a deadlie miss. We use LB slack ad UB slack to defie the slack calculated cosiderig LB trip time ad UB trip time, respectively. I our results, i cases timig requiremet ca be loose, while there are cases where these are very tight, ad the results are show for all cases. We particularly care for tight deadlies, ad wat to avoid ay deadlie misses. ) Performace Compariso with Base Lies: Figure 9 presets the results usig 6 servers o oe rack for sigleiteratio cotrol loop. We used 4 weeks log data traces for this rack. We set the 95-th percetile of all aggregate powers
R3 Cluster C2 R2 Cluster C R Cluster C 2 4 6 8 2 4 6 Rack R Cluster C Rack R2 Cluster C Rack R3 Cluster C2 2 4 6 8 2 4 6 8 servers (54 readig per server) (a) Correlatios amog 8*8 server pairs i 3 racks i 2 clusters.8.6.4.2.2 CDF.8.6.4.2 Rack R3 C2 Rack R2 C Rack R C 25 3 35 4 45 5 55 6 Number of servers over cap (b) Total umber (out of 6) of servers that exceed local cap c 6 Racks i Cluster C 2 3 4 5 2 3 4 5 Racks i Cluster C2 x 3 (c) Probability of simultaeous peak betwee two differet clusters 5 4 3 2 Fig. 8. Correlatios amog servers, racks, ad clusters CDF of LB slack values.8.6.4.2 Periodic Evet drive CDF.8.6.4.2 Packet loss rate (%) 8 6 4 2 BoxMAC Evet drive 3 4 5 6 7 8 9 Lower boud slack value (ms) x 4 (a) CDF of lower boud slack Fig. 9. 2 3 4 5 6 Detectio slot i detectio phase (b) CDF of detectio slots i detectio phase Performace of Evet-Drive protocol o 6 servers (4 weeks) 2 3 4 5 6 Power cappig evet # (c) Packet Loss Rate values of all data poits i every 2-miute iterval as its cap c. For assigig local cap we use α =. I ruig the protocols usig these traces, the protocols observe all peaks. The upper boud of aggregatio latecy (L agg ) give i (3) was set as the period of the periodic protocol. Figure 9(a) shows the LB slacks for both the evet-drive protocol ad the periodic oe. The figure oly plots the CDF for the cases where the magitude of oversubscriptio was above.5 for better resolutio as the slack was too big for a smaller magitudes (which are ot of iterest). Sice UB trip times are easily met, we also omit those results. The o-egative LB slack values for each protocol idicate that it easily meets the trip times. Hece there is o beefit i usig o-stop commuicatios (i.e., the aive periodic protocol). While the slacks i evet-drive protocol are shorter tha those i the periodic protocol because the former speds some time i the detectio phase, i 8% cases evet-drive protocol ca provide a slack of more tha 57.5s while the periodic protocol provides 57.88s. The differece is ot sigificat because as show i Figure 9(b) i 9% cases amog all power cappig evets the detectio happeed i the first slot of the detectio cycle. Oly i % cases, it was after the first slot of the detectio phase, ad all detectio happeed withi the 6-th slot, although the phase had a total of 6 slots (for 6 servers, oe slot per server). These results idicate that CapNet s local detectio policy ca quickly determie the evets. This is also a implicatio that experimetal values of power cappig latecies are quite differet (or shorter) from the pessimistic aalytical values derived i (5). Also, i this experimet, 94.6% of the total detectio phases did ot have ay trasmissio from the servers. Therefore, if we compare with the periodic protocol that eeds to cotiue commuicatio always i the etwork, the evet-drive protocol Fig.. CDF.8.6.4.2 2 servers 24 servers 48 servers 2 4 6 8 Lower boud slack values (ms) x 5 CDF of LB slack uder various umbers of servers (4 weeks) suppresses trasmissios at least by 94.6% while the realtime performace of two protocols are similar. We also evaluate the performace whe BoxMAC (the default CSMA/CA based protocol i TiyOS [2]) is used for power cappig commuicatio for up to first 6 cappig evets i the data traces. Figure 9(c) shows that it experieces packet loss rate over 74% while performig commuicatio for a power cappig evet. This happes because all 6 odes try to sed at the same time, ad the back-off period i 82.5.4 CSMA/CA uder default settig is too short, which leads to frequet repeated collisios. Sice we lose most of the packets, we do ot cosider latecy uder CSMA/CA. Icreasig the back-off period reduces collisios but results i log commuicatio delays. I subsequet experimets, we exclude CSMA/CA as it does ot fit for power cappig. 2) Scalability i Terms of Number of Servers: I our data traces each rack has at most 6 active servers. To test with more servers, we combie multiple racks i the same cluster sice they have similar patter of power cosumptio (as we have already discussed i Subsectio V-D. For sake of experimetatio time, i all subsequet experimets we set cap at 98-th percetile (that would result i a smaller umber of cappig evets). The lower boud slack distributio are
Fig.. Rate (%) 5 4 3 2 False alarm rate Miss rate.8.85.9.95 α Deadlie (trip time) miss rate ad false alarm rate uder varyig α show i Figure for 2, 24, ad 48 servers by mergig 2, 4, ad 8 racks, respectively (for sigle iteratio cappig). Hece, for sigle iteratio, the deadlies are easily met for eve 48 servers (sice i each setup, % of all slack values are positive). 3) Experimets uder Varyig α: Now we experimet with differet values of α for assigig a local cap of αc to the servers usig 48 servers. The results i Figure show the tradeoff betwee false alarm rate ad power cappig latecy uder varyig α. As we decrease the value of α from to.8, the false alarm rate decreases from 45% to 2%. This happes because with decreased value of α, CapNet cosiders multiple alarms before detectig a potetial evet. Note that this alarm rate is very small compared to the whole time widow sice power cappig happes i at most 5% cases. Therefore alarms are also geerated rarely. Sice waitig for multiple alarms icreases the latecy i detectio, the total power cappig latecy icreases as the value of α decreases. However, as this latecy icrease happes oly i the detectio phase which is egligible compared to the total cappig latecy, there is hardly ay impact o deadlie miss rates. The figure shows a deadlie miss rate of uder varyig α. 4) Scalability i Number of Cotrol Iteratios: Now we cosider a coservative case where multiple iteratios of cotrol loop are required to settle to a sustaied power level [4], [7], [8]. The umber of iteratios required for the rack-level loop as experimeted i [8] ca be up to 6 i the worst case (which happes very rarely). Hece, we ow coduct experimets cosiderig multiple umbers of cotrol iteratios (up to 6 assumig a pessimistic sceario). We plot the results i Figure 2 for various umbers of servers uder various umber of iteratios. As show i Figure 2(a), for 2 servers uder 6-iteratio case, we have 3% cases with egative slack meaig that the LB trip times were missed. However, the UB trip times were met i % cases. Note that we have cosidered a quite pessimistic set up here because usig 6-iteratio as well as tryig to meet the lower boud of trip times are both very coservative cosideratios. For 2 servers uder 8 iteratios, i.3% cases slacks were egative. However, i 8% cases the slacks were above 92.492s, 66.694s, ad 22.238s for 4, 8, ad 6 iteratios, respectively idicatig that the trip times were easily met, ad the system could oversubscribe safely. For 4-iteratio, the miimum slack was 23.2s. To preserve figure resolutio, we do ot show the UB slacks sice they were all positive. For 48 servers (Figures 2(b), 2(c)), 98.95%, 97.86%, 94.93%, ad 67.2% LB trip times were met for 2, 4, 8, ad 6 iteratios, respectively. For 24 odes, we miss deadlies i 5% cases uder 8-iteratio ad 3.94% cases uder 6-iteratio. For all cases we met UB trip times i % cases. Note that assumig 6-iteratio ad cosiderig the LB trip times are very coservative assumptio as it ca rarely happe. Hece, the above results show that, eve for 48 servers, the latecies icurred i CapNet for power cappig remai withi eve the coservative latecy requiremets i most cases. 5) Experimets uder Varyig Caps: I all experimets we have performed so far, CapNet was able to meet UB trip times. Now we make some setup chages to ecouter some sceario where UB trip times ca be smaller, by makig oversubscriptio magitude higher. For this purpose, we ow decrease the cap to decrease the trip times so as to make scearios to miss upper boud trip times to see the robustess of the protocol. Now agai we set the 95-th percetile of aggregate power as the cap. This would give the previous cappig evets shorter deadlies sice a smaller cap implies a larger magitude of oversubscriptio. For the sake of experimet time, we oly tested with 2 servers ad their 4 week data traces. Figure 3 shows that we ow miss more LB trip times ad miss some UB trip times as well sice the deadlies ow become shorter. However, UB trip times are missed oly i.% ad.2% cases uder 8 ad 6 iteratios, respectively, while LB deadlies were missed i 2.4%, 6.84%, ad 26.56% cases uder 4, 8, ad 6 iteratios, respectively. All deadlies were met for up to 3 iteratios (ad ot show i the figures). We have show the results oly for higher umber of iteratios that rarely happe. These results demostrate the robustess for larger magitude of oversubscriptio i that eve whe we use 6-iteratio oly.2% UB trip times are missed. 6) Experimets i Presece of Multiple Clusters: We have show through data ceter trace aalysis i Figure 8(c) that the probability that two clusters are over the cap simultaeously is o greater tha.56. Yet, i this sectio we perform some experimet from a pessimistic poit of view. I particular, we perform a experimet ad see the performace of CapNet uder a iterferig cluster. We mimic a iterferig cluster of 48 servers i the followig way. We select a earby cluster ad place a pair of motes i the rack: oe at the ToR ad the other iside the rack. We set their Tx power at maximum (dbm). The mote at the ToR represets its maager ad carries o a patter of commuicatio like a real maager to cotrol 48 servers. The mote iside the rack respods as if it were coected to each of 48 servers. Specifically, the maager executes a detectio phase of 48ms, ad the ode i the rack radomly selects a slot betwee ad 48. O that slot, it geerates a alarm with probability 5% sice cappig happes i o more tha 5% cases. Wheever the maager receives the alarm, it geerates a burst of commuicatio i the patter like what it would have doe for 48 servers. After fiishig this patter of commuicatio it resumes the detectio phase. We ru the mai cluster (system used for experimet) usig 4 weeks data traces, ad plot the results i Figure 4.
CDF.8.6.4.2 4 iteratios 8 iteratios 6 iteratios.5.5.5 LB slack (ms) x 6 (a) LB slack for 2 servers Fig. 2. CDF.8.6.4.2 4 iteratios 8 iteratios 6 iteratios.5.5 LB slack (ms) x 6 (b) LB slack for 48 servers Multi-iteratio cappig uder evet-drive protocol (4 weeks) LB trip time miss rate (%) 35 3 25 2 5 5 2 servers 24 servers 48 servers 5 5 Number of cotrol iteratios (c) Miss rate (LB trip time) CDF.8.6.4 UB Trip time (8 iteratios).2 LB Trip time (8 iteratios) UB Trip time (6 iteratios) LB Trip time (6 iteratios) 2 2 4 6 8 Slack (ms) x 6 (a) CDF of slack values (Cap: 95-th percetile) Fig. 3. Miss rate (%) 3 25 2 5 5 Cap: 95 th percetile Cap: 98 th percetile 4 8 6 Number of iteratios (b) LB trip time miss rate Cappig uder differet caps o 2 servers (4 weeks) Miss rate (%).4.2.8.6.4.2 Cap: 95 th percetile Cap: 98 th percetile (values are here) 4 8 6 Number of iteratios (c) UB trip time miss rate Fig. 4. Latecy (ms) Miss rate (%).5.5 2 x 5 iterferig cluster No iterferig cluster 5 5 2 25 3 35 Cappig evet # 5 4 3 2 (a) Cappig latecy LB Trip time (No iterferig cluster) LB Trip time ( iterferig cluster) UB Trip time (No iterferig cluster) UB Trip time ( iterferig cluster) 4 8 6 Number of iteratios (b) Miss rate Cappig for 48 servers uder iterferig cluster Figure 4(a) shows the latecies for differet cappig evets i 4 weeks data both uder iterferece ad without iterferece (whe there was o other cluster). Uder iterferig cluster, the delays mostly icrease. This happes because the evetdrive protocol experieces packet loss ad uses retrasmissio for those, thereby icreasig etwork delays. While the maximum icrease was 24.63s, i 8% cases the icrease was less tha 5.89s. We oticed that such big icrease happeed due to the loss of alarms i a detectio phase that resulted i a detectio i the ext phase (i.e., while the phase legth is 48s). Still power cappig was successful i all cases but those whe the cotrol broadcast was lost. Amog 375 evets, 4 broadcasts were lost at some server eve after 2 repeatatios, resultig i cotrol failure i.6% cases. This value became i multi-iteratio cases. For multi-iteratio cases, at least oe cotrol broadcasts was successful that resulted i o cappig failure for cotrol message loss. However, as the delay due to trasmissio failure ad recovery icreased i detectio phase, we experieced cappig failure. For 6-iteratio, we missed the upper boud of trip time i 4.27% cases ad lower boud of trip times i 32.8% cases. However, we use a coservative assumptio here. For 4 iteratio miss rate was 5.6% ad 8.26% oly. Ad for 2-iteratio they are oly 2.3% ad 2.4% which are very margial. The result idicates that eve uder iterferece, CapNet demostrates robustess i meetig the real-time requiremets of power cappig. VI. DISCUSSIONS AND FUTURE WORK While our paper addresses feasibility, protocol desig ad implemetatio, several egieerig challeges such as security, EMI ad fault tolerace eeds to be addressed. Fault Tolerace. Oe importat challege is hadlig the failure of power cappig maager i a cluster. To address this, power cappig maagers ca be coected amog themselves either through a differet bad or through a wired backboe. As a result, whe some maager fails, a earby oe ca take over its servers. This paper focuses o commuicatio withi a sigle cluster. DCM fault detectio, isolatio, ad ode migratio eed to be studied i future work. Security. Aother challege is the security of the maagemet system itself. Sice the system relies o wireless cotrol, someoe might be able to maliciously tap ito the wireless etwork ad take cotrol of the data ceter. There are two typical approaches to hadle this security issue: First, the sigal itself should be atteuated by the time it reaches outside the buildig. We ca idetify secure locatios iside the data ceter from which the cotroller ca commuicate, ad idetify a sigature for the cotrollers which would be kow to the server machies. Secod, it is possible to ecrypt wireless messages, for example, usig MoteAODV (+AES) [26]. We
ca also use shieldig withi the data ceter to keep the RF sigals cotaied withi the eclosed regio. EMI & Compliace. While less emphasized i research studies, a practical cocer of itroducig wireless commuicatios i data ceters is that they do ot adversely impact other devices. There are FCC certified IEEE 82.5.4 circuit desig available(e.g. [27]). Previous work has also used WiFi ad ZigBee i live data ceters for moitorig purposes [9]. VII. RELATED WORK I order to reduce the capital spedig o data ceters, eterprise data ceters use a over-subscriptio approach as studied i [2] [5], which is similar to over-bookig i airlie reservatios. Server vedors ad data ceter solutios providers have started to offer power cappig solutios [28], [29]. Power cappig usig feedback cotrol algorithms [3] has bee studied for idividual servers. I cotrast, the study of this paper cocetrates to coordiated power cappig which is more desirable i data ceters as it allows servers to exploit power left uused by other servers. While such power cappig has bee studied before [4], [8], [3] [34], all existig solutios rely o wired etwork for cotroller-server commuicatio. I cotrast, we focus o wireless etworkig for power cappig. We have outlied the advatages of wireless maagemet i Sectio II. Previous work o usig wireless etwork i data ceters exists o applicatios to high badwidth (e.g. with 6GHz radio) productio data etwork [35]. I cotrast, CapNet is targeted at data maagemet fuctios that have much lower badwidth requiremet while demadig real-time commuicatio through racks. RACNet [9] is a passive moitorig solutio i the data ceter that moitors temperature or humidity across racks where all radios are mouted at the top of the rack. Our solutio eables active cotrol ad requires commuicatio through racks ad server eclosures, ad hece ecouters fudametally differet challeges. Also, RACNet also does ot have real-time features, while CapNet is desiged to meet the real-time requiremets i power cappig. VIII. CONCLUSION Power cappig is a time-critical maagemet operatio for data ceters that commoly oversubscribe power ifrastructure for cost savigs. I this paper, we have desiged CapNet, a low-cost, real-time wireless maagemet etwork for data ceters ad validated its feasibility for power cappig. We deployed ad evaluated CapNet i a eterprise data ceter. Usig server power traces, our experimetal results o a cluster of 48 servers iside the data ceter show that CapNet ca meet the real-time requiremets of power cappig. CapNet represets a promisig step towards applyig low power wireless etworks to time-critical, close-loop cotrol i DCM. ACKNOWLEDGMENT This work was supported, i part, by Microsoft Research ad NSF through grats CNS-3292 (NeTS) ad 44552 (NeTS). REFERENCES [] M. Isard, Autopilot: automatic data ceter maagemet, Operatig Systems Review, vol. 4, pp. 6 67, 27. [2] M. Al-Fares, A. Loukissas, ad A. Vahdat, A scalable, commodity data ceter etwork architecture, i SIGCOMM 8. [3] Private commuicatio with data ceter operators. [4] www.cdwg.com/shop/products/digi-passport-48-cosole-server/377.aspx. [5] http://www.cdwg.com/shop/search/servers-server-maagemet/ Servers/x86-Based-Servers/result.aspx?w=S62&pCurret=&p= 28&a52=22. [6] http://www.cdwg.com. [7] http://www.cdwg.com/shop/search/networkig-products/ Etheret-Switches/Fixed-Maaged-Switches/result.aspx?w= N&MaxRecords=25&SortBy=TopSellers. [8] http://www.digikey.com/us/e/techzoe/wireless/resources/articles/ comparig-low-power-wireless.html. [9] C.-J. M. Liag, J. Liu, L. Luo, A. Terzis, ad F. Zhao, RACNet: a high-fidelity data ceter sesig etwork, i SeSys 9, 29. [] K. Sriivasa ad P. Levis, RSSI is uder appreciated, i EmNets 6. [] Hamilto, 28, http://perspectives.mvdiroa.com. [2] S. Pelley, D. Meiser, P. Zadevakili, T. F. Weisch, ad J. Uderwood, Power routig: Dyamic power provisioig i the data ceter, i ASPLOS. [3] X. Fu, X. Wag, ad C. Lefurgy, How much power oversubscriptio is safe ad allowed i data ceters? i ICAC. [4] H. Lim, A. Kasal, ad J. Liu, Power budgetig for virtualized data ceters, i USENIXATC. [5] X. Fa, W.-D. Weber, ad L. A. Barroso, Power provisioig for a warehouse-sized computer, i ISCA 7. [6] http://literature.rockwellautomatio.com/idc/groups/literature/ documets/sg/489-sg -e-p.pdf. [7] A. Bhattacharya, D. Culler, A. Kasal, S. Govida, ad S. Sakar, The eed for speed ad stability i data ceter power cappig, i IGCC 2. [8] X. Wag, M. Che, C. Lefurgy, ad T. Keller, SHIP: A scalable hierarchical power cotrol architecture for large-scale data ceters, IEEE Trasactios o Parallel ad Distributed Systems, vol. 23, 22. [9] A. Saifullah, Y. Xu, C. Lu, ad Y. Che, Distributed chael allocatio protocols for wireless sesor etworks, IEEE Trasactios o Parallel ad Distributed Systems. [2] I. Rhee, A. Warrier, M. Aia, ad J. Mi, Z-MAC: a hybrid MAC for wireless sesor etworks, i SeSys 5, 25. [2] TiyOS Commuity Forum, http://www.tiyos.et/. [22] X. Fa, W.-D. Weber, ad L. A. Barroso, Power provisioig for a warehouse-sized computer, i ISCA 7, 27. [23] J. Choi, S. Govida, B. Urgaokar, ad A. Sivasubramaiam, Profilig, predictio, ad cappig of power cosumptio i cosolidated eviromets, i MASCOTS 8. [24] P. Ragaatha, P. Leech, D. Irwi, J. Chase, ad H. Packard, Esemble-level Power Maagemet for Dese Blade Servers, i ISCA 6. [25] G. Che, W. He, J. Liu, S. Nath, L. Rigas, L. Xiao, ad F. Zhao, Eergyaware server provisioig ad load dispatchig for coectio-itesive iteret services, i NSDI 8. [26] W. Backes ad J. Cordasco, Moteaodv R a aodv implemetatio for tiyos 2., i WISTP. [27] http://www.microchip.com/wwwproducts/devices.aspx?ddocname=e535967. [28] http://www.itel.com/cotet/dam/doc/case-study/ data-ceter-efficiecy-xeo-baidu-case-study.pdf. [29] http://h2.www2.hp.com/bc/docs/support/supportmaual/ c549455/c549455.pdf. [3] Z. Wag, C. McCarthy, X. Zhu, P. Ragaatha, ad Talwar, Feedback cotrol algorithm for power maagemet of servers, i ASPLOS 8. [3] R. Raghavedra, P. Ragaatha, V. Talwar, Z. Wag, ad X. Zhu, No power struggles: coordiated multi-level power maagemet for the data ceter, i ASPLOS 8. [32] M. E. Femal ad V. W. Freeh, Boostig data ceter performace through o-uiform power allocatio, i ICAC 5. [33] V. Kotoriisy, Zhagy, Aksaliy, ad Sampso, Maagig distributed UPS eergy for effective power cappig i data ceters, i ISCA 2. [34] Y. Zhag, Y. Wag, ad X. Wag, Cappig the electricity cost of cloudscale data ceters with impacts o power markets, i HPDC. [35] X. Zhou, Z. Zhag, Y. Zhu, Y. Li, S. Kumar, A. Vahdat, B. Y. Zhao, ad H. Zheg, Mirror mirror o the ceilig: flexible wireless liks for data ceters, i SIGCOMM 2.