TIGER:Thermal-Aware File Assignment in Storage Clusters

TIGER:Thermal-Aware Fle Assgnment n Storage Clusters Ajt Chavan, Xunfe Jang, Mohemmad I. Alghamd, Xao Qn,Mnghua Jang, and Jfu Zhang Department of Computer Scence and Software Engneerng, Auburn Unversty, Auburn, USA Department of Computer Scence, Al-Baha Unversty, Kngdom of Saud Areba College of Mathematcs and Computer Scence, Wuhan Textle Unversty, Wuhan, Chna School of Computer Scence and Technology, Tayuan Unversty of Scence and Technology, Tayuan, Chna Abstract In ths paper, we present thermal-aware fle assgnment technque called TIGER for reducng coolng cost of storage clusters n data centers. TIGER frst calculates the thresholds of dsks n each node based on ts contrbuton to heat recrculaton n a data center. Next, TIGER assgns fles to data nodes accordng to calculated thresholds. We evaluated performance of TIGER n terms of both coolng energy conservaton and response tme of a storage cluster. Our results confrm that TIGER reduces coolng-power requrements for clusters by offerng about 10 to 15 percent coolng energy savngs wthout sgnfcantly degradng I/O performance. I. INTRODUCTION Thermal management for power-dense storage clusters can address coolng problems n today s data centers. In ths paper, we show that thermal-aware fle assgnment polces can sgnfcantly reduce coolng cost of data center by lowerng peak nlet temperatures of storage clusters. The followng three factors make thermal-aware fle assgnment desrabe and practcal for storage clusters: the hgh coolng cost of large scale data centers, the rapd heat recrculaton caused by data nodes n storage clusters, and the ablty of fle assgnment polces to manage utlzaton of data nodes based on I/O access patterns. Data nodes n storage clusters are typcally confgured wth low power processors and RAID arrays contanng multple 4 to 32) dsks. Modern storage systems account for almost 27% of total energy consumpton [4]. Energy and coolng cost caused by data nodes motvate us to study fle assgnment solutons that can reduce nlet temperatures of data nodes. The recrculaton of hot ar from the outlet of data nodes back to ther nlets nevtably rase nlet temperatures and may cause hot spots [8], whch forces computer room ar condtons to contnuously work at lower temperatures, ncreasng coolng cost. The goal of ths study s to mnmze heat recrculaton and coolng cost, thereby ncreasng energy effcency of data centers housng storage clusters. Dsks have non-neglgble thermal mpact on data nodes [2]. We developed thermal model to estmate the nlet temperature of storage servers based on processor and dsks utlzatons. We compared response tme and coolng cost of storage systems managed by three data placement strateges, among 978-1-4799-0218-7/13/$31.00 2013 IEEE whch one can notceably reduce coolng cost of storage systems n data centers. In ths paper, we am to develop a fle placement scheme- TIGER - to offer tradeoffs between performance and thermal profle n storage clusters. At the core of our TIGER approach, peak nlet temperatures of data nodes are reduced by the vrtue of thermal-aware fle assgnment. The fle assgnment process reles on I/O loads thresholds that can be derved n two steps. Frst, TIGER apples cross-ntereference coeffcents to calculate contrbutons of each node to the heat recrculaton of an entre storage clusters. Next, TIGER calculates the load thresholds of dsks n each data nodes based on ts contrbuton to heat recrculaton. A. Power Model II. MODELING Clusters n a data center are comprsed of both computng nodes and data nodes. The terms data nodes and storage nodes are used nterchangeably throughout ths paper. Let P comp be the power consumed by computng nodes and P storage be the power consumed by storage nodes, whch s nothng but summaton of power consumed by each data node. Therefore, total power consumpton P C of cluster n data center can be calculated by: P C = P comp + N =1 P node. 1) where, P node s power consumpton of th data node. The power consumpton P node n equaton 1 can be derved from 1) a fxed amount of power P base consumed by node s hardware e.g., fans) other than processor and dsks, 2) power P cpu consumed by node s CPU and power P d consumed by dsks resdng n the node, whch s summaton of power consumpton by each dsk n resdng n the node. Thus we can calculate P node as: P Node = P base D + + P CP U j=1 P d 2) where, P d s power consumed by jth dsk n th data node and D s total number of dsks n th data node. In what follows, we model the power consumpton P d of dsk j n storage node. Dsks have three modes of

operatons: actve, dle and sleep, each of whch has a specfc power requrement. We denote the power consumed by a sngle dsk n the actve, dle and n the sleep mode as P d,actve, P d,dle, P d,sleep, respectvely. Power overhead s ncurred when dsks are transtonng among the mode e.g., from the sleep mode to actve or vce versa). We denote the power requred to spn down a dsk as P Sdown and power needed to spn up the dsks as P Sup. Gven a tme nterval T, let t actve, t dle, and tsleep represent tme perods when dsk j n node s actve, dle, and sleep, respectvely. We denote N t as the number of power-state transtons. Now, we model the dsk power consumpton P d as: P d = 1 t actve P d,actve + t dle P d,dle T +t sleep P d,sleep B. Heat Recrculaton Model + N t 2 PSdown + P S up ) ) 3) Handful of models have been proposed to characterze the heat recrculaton n data centers [3] [6] [9]. All those models are well nvestgated and well valdated and they predct the nlet temperature of nodes n cluster wth reasonable accuracy. We used the model proposed by Gupta et al. [9], whch characterzed heat recrculaton by a cross-nterference matrx A n n = α, where α denotes fracton of ts outlet heat node contrbutes to node j. Therefore, accordng to the model proposed n [9], the vector of nlet temperature can be calculated by: [ t n = t sup + K A K) 1 K 1] p. 4) where, t n s the vector of nlet temperatures T n, t sup s vector of supply temperature of CRAC T sup, p s vector of power consumpton of each node P node. K n n s a dagonal matrx of thermodynamc constants K : K = ρa c p 5) where, ρ s the ar densty n grams per cubc meter), a s the arflow rate n cubc meters per seconds) of the node, and c p s the specfc heat of the ar n joules per gram Kelvn). C. Coolng Cost Model Heat recrculaton and node power consumpton lead to an ncrease of nlet temperature. To control the rased nlet temperatures below redlne, a coolng system s appled. Temperature of the ar suppled by the coolng system s adjusted accordng to the maxmum nlet temperature. Supply temperature T sup affects the effcency of the coolng system. The effcency of coolng system s quantfed n terms of Coeffcent of Performance COP) [5] [8] see 6) below). COP T sup ) = 0.0068T 2 sup + 0.0008T sup + 0.458 6) COP ncreases when supply temperature goes up; ncreasng supply temperatures results n hgh coolng effcency. 7) shows how to derve coolng cost from COP. P AC = P C COP T sup ) where P C s the total power consumed by the storage nodes n data center [8]. III. TIGER : THERMAL AWARE FILE ASSIGNMENT FOR A. Basc Ideas DATA CENTERS The goal of TIGER s place a set of m fles to a group of N nodes n such a way to reduce coolng cost of a data center. The servce tme s and access rate λ of fle f are provded by a fle access predctor see, for example, [1]). The algorthm s comprsed of two phases. In the frst phase, thresholds for dsk utlzaton s determned see Secton III-B). In the second phase, fles are assgned to storage nodes untl the threshold s reached see Secton III-C). In the process of calculatng the dsk utlzaton threshold, we take nto account both performance and thermal management. To mprove I/O performance, we apply a load balancng strategy to unformly dstrbute I/O load among all the dsks. When t comes to thermal management, we follow the prncple that workload placed on the node should be nversely proportonal to the contrbuton of the node n the heat recrculaton n a data center. To place workload unformly accordng to ths prncple, one has to ensure that all the nodes should contrbute equally n heat recrculaton. Achevng ths goal may be dffcult and; therefore, t normally useful to have a calbraton phase, where we adjust the calculated threshold accordng to each node s contrbuton n the heat recrculaton. Durng the fle assgnment procedure, the lst of nodes are sorted n the ncreasng order of ther heat-recrculaton contrbuton. For each node n the lst, fles are assgned to each dsk on the node untl the threshold has been reached. We keep dong ths untl ether the node lst s empty or there are no fles remanng. If the node lst s empty and there are some fles remanng, then we wll start from the frst node n the node lst and keep assgnng fles untl ether utlzaton reaches 90% or all fles have been assgned. B. Dsk Utlzaton threshold calculaton Now we dscuss how to calculate dsk utlzaton threshold to be used n the second phase of our approach. Recall that the utlzaton threshold s ntroduced to gude the fle assgnment procedure, whch affects node power consumpton that make sgnfcant mpacts on the outlet and nlet temperatures. As mentoned earler, we frst calculate the threshold usng the load balancng strategy. The utlzaton of the dsk d s ncreased by u due to the allocaton of fle f. The utlzaton u s a product of servce tme s and access rate λ of the fle. Therefore: u = s λ 8) Our fle assgnment algorthm ams to dstrbute the total utlzaton U generated by all the fles to D dsks. We use the greedy algorthm to unformly balance load among all 7)

the avalable dsks. Dsk utlzaton threshold Uavg T h calculated usng the followng expresson: avg = 1 D can be m s λ 9) Ths average threshold can be adjusted accordng to each node s contrbuton n the heat recrculaton of the data center. We characterze the heat recrculaton as cross-nterference coeffcent. Then, the total contrbuton of node n the heat recrculaton of a data center can be consdered as sum of all the cross-nterference coeffcents of the node normalzed over sum of all the cross-nterference coeffcents of all the nodes. Therefore, n j=1 S = α 10) S total =1 where, S total s sum of all the cross nterference coeffcents of all the nodes n a cluster. Ideally, unformly dstrbutng workload makes all the nodes dentcal n terms of heat recrculaton. Thus, we have: S = S avg, N 11) where N s set of all nodes and S avg = 1 n. Although the above expresson shows the best case, 11) does not hold for most of the nodes n the data center. In realworld scenaros, a node s contrbuton to heat recrculaton mght be ether hgher or lower than the average contrbuton S avg. Ths leads us to dscuss the followng two cases. Case 1: S > S avg Ths case holds for most nodes that are nearer to the floor surface. We calculate the normalzed dfference between S and S avg see 12)) and decrease the threshold for the dsks n node by the normalzed dfference. S = S S avg S avg 12) = Uavg T h S Uavg T h ) 13) Case 2: S < S avg Ths case holds for most of the nodes nearer to the celng. As these nodes contrbute less to the total heat recrculaton of the data center, we place more workload on these nodes. We calculate the normalzed dfference see 14)) between S avg and S ; the dsk utlzaton threshold for these node s ncreased by the normalzed dfference. = S = S avg S S avg 14) Uhgh T h = Uavg T h + S Uavg T h ) hgh f U threshold hgh < 1.0 1.0 f U threshold hgh > 1.0 15) 16) C. Tger: Algorthm The Tger algorthm solves the thermal management problem by applyng thermal-aware fle assgnment n data centers. Tger reles on fle access patterns and the amount of heat recrculaton to make fle placement decsons. Algorthm 1: TIGERfle nfo, node nfo) 1: U 0 2: for f m do 3: U U + s λ 4: end for 5: avg 1 D U 6: S total 0 7: for node = 1 N do 8: S n j=0 α j 9: S total S total + S 10: end for 11: S avg 1 N 12: sort the nodes accordng to S 13: k 0 14: for all node sorted lst do 15: S S S total 16: f S > S avg then 17: Calculate threshold usng equaton 13) 18: end f 19: f S < S avg then 20: Calculate threshold usng equaton 16) 21: else 22: avg 23: end f 24: for all dsk j D do 25: whle U j < U T h do 26: assgn fle f k to dsk j 27: U j U j + λ k s k ) 28: k k + 1 29: end whle 30: end for 31: end for 32: f k < m then 33: {stll some fles are remanng} 34: Start from the frst node of the sorted lst, 35: keep assgnng fles to the dsk n the node untl the utlzaton of the dsks reaches 0.9 36: Repeat lne 35 for consequent nodes n the sorted lst untl k=m. 37: end f Pror to makng any fle placement decson, Tger calculates the average dsk utlzaton threshold Uavg T h see lnes 2-5), thereby usng the greed method to unformly dstrbute I/O load among avalable dsks. After the ntal assgnment s complete, Tger computes three mportant factors.e., S avg, S, and S total, whch are used to calbrate the dsk utlzaton threshold of each nodesee lnes 6-11). Next, Tger sorts the lst of nodes n an ascendng order of ther heat recrculaton

contrbuton S see lne 12). Tger then pcks the frst node from the sorted node lst, and adjusts the dsk utlzaton threshold for all the dsks n the selected node dependng upon the values of S and S avg see lnes 14-23). Fnally, Tger assgns fles to each dsk n the selected node untl ether the threshold s reached or the dsk s free capacty becomes empty lnes 25-29). Tger repeatedly performs steps 14-29 untl all the fles are placed to the dsks. If the node lst s empty and there are some fles remanng, then we wll start from the frst node n the node lst and keep assgnng fles untl ether utlzaton reaches 90% or all fles have been assgnedlnes 34-36). A. Baselne Algorthms IV. EVALUATION To evaluate TIGER s system performance, we choose the followng two baselne algorthms to compare aganst TIGER. The frst one s a greedy load-balancng algorthm; the second one s the coolest Inlet algorthm. 1) The Greedy Load-balancng Algorthm: The greedy load balancng algorthm unformly dstrbute I/O load among all avalable dsks n data nodes. For far comparsons, a predcton module offers the greedy algorthm wth the servce tme s and access rate λ of each fle.e., fle f ). The greedy algorthm calculates the total I/O load caused by requests accessng all the fles. The algorthm then unformly dstrbutes the I/O load to all the dsks. 2) Coolest Inlet [8]: Ths algorthm dstrbutes workload based on nlet temperatures of nodes. It places more workload on the nodes wth lower nlet temperature. For example, the threshold of nodes s nversely proportonal to the nlet temperature of the nodes. The fles are assgned to the dsks upto ts threshold, whch s dentcal for all the dsks n a node. B. Expermental Setup We use a smulator wrtten n C for our smulaton study. For most of the tests, the data center contans 2 rows of 5 racks each. A rack contans 5 chasssor nodes), each of whch contans sx 1U RAID arrays. Every RAID array contans a RAID controller no processor) and 4 hot swappable dsks and draws 118 W power when no dsks are attached. Therefore, we have: P dle a = 118W 17) C. Thermal Impact of Energy Effcent Dsks 1) Scenaro 1: Fgure 1 shows the results for the best case scenaro. In ths case, we assume that an effcent energy savng algorthm s used so that when the dsks are not n actve mode, they are spun down to the sleep mode. The power consumed by a dsk have three components: power consumed by the dsk n the actve mode, power consumed by the dsk n the sleep mode, and the power consumed by the dsk to make transtons between dfferent states. Therefore, Equaton 3 s smplfed to: P d = 1 T t actve j P d,actve + t sleep j P d,sleep + N t j 2 PSdown + P S up ) ) 18) We observe from both Fgure 1a) and 1b) that TIGER conserves more coolng energy than the other two algorthms. The dfference n the performance s substantal almost 15%) for data center utlzaton between 30-60% dmnshng towards the two extreme ends. Ths s because, wth data center utlzaton between 30-60%, there s great opportunty to unbalance the workload n order to acheve thermal benefts. Towards the both extreme cases, there s not much room avalable for unbalancng the workload to acheve thermal benefts. 2) Scenaro 2: Fgure 2 shows the results for the case where no energy effcent technques are used to spn up and spn down the dsks. Ths s the worst case scenaro n terms of energy savngs. The dsk would be ether n one of the two states - actve or dle. There are no transtons from the actve mode to the sleep mode and vce versa. Then, no extra power s consumed for transtons. Therefore, Equaton 3 becomes: t actve j P d,actve + t dle j P d,dle ) 19) P d = 1 T From Fgure 2 we can see that though the TIGER outperforms the other two algorthms, the dfferences among the three solutons are very small. The pwe dscrepancy between the actve mode and the dle mode s almost neglgble. Therefore, the dstrbuton of power among the nodes n the data center does not vary much due to the workload dstrbuton. Also, as the dle dsks consume nearly equal power as the actve dsks, the overall power consumpton n the data center s very hgh. Ths power feature results n hgh coolng cost for scenaro 2. V. RELATED WORK Thermal-aware workload placement strateges were proposed n recent studes [7] [8], whch ndcate that energy effcency of CRAC can be mproved by reducng the peak temperature n data center. For example, both a generc algorthm [8] and sequental quadratc programmng approach [7] were developed to manage workload n a way to reduce the maxmum nlet temperatures. Reducng the negatve mpact of heat recrculaton s a new step towards savng coolng energy. For example, Moore et al. desgned two approaches, ZBD and MnHR [5]. The ZBD scheme that uses paochng at where the effect of the heat recrculaton s ovserved whereas MnHR manages workload n a way that each pod n a data center generates same amount of heat to mnmze heat recrculaton [5]. Wang et al. proposed a way of calculatng heat generated by jobs, whch are sorted n descendng order of ther hotness [10]. All the above strateges focused on computng nodes and used lnear power model drven by CPU utlzaton. Unlke these technques, our TIGER approach ams to reduce heat recrculaton through fle assgnment.

Maxmum Inlet temperature o C) 28 27.5 27 26.5 26 25.5 Coolng cost W) 5.5 x 104 5 4.5 4 3.5 3 25 Data Center utlzaton %) 2.5 Data center utlzaton %) a) Maxmum T n b) Coolng Cost Fg. 1: Comparson of algorthms under scenaro 1: When dsk s not actve, t s always turned down n sleep mode. Maxmum nlet temperature o C) 28 27.5 27 26.5 26 25.5 Coolng Cost W) 5.5 x 104 5 4.5 4 3.5 3 25 Data center utlzaton %) 2.5 Data center utlzaton %) a) Maxmum T n b) Coolng Cost Fg. 2: Comparson of algorthms under scenaro 2: When dsk s not actve, t s always n dle state. VI. CONCLUSION In ths paper, we proposed and mplemented TIGER, a fle assgnment approach to reducng coolng energy requrements of data centers. TIGER frst decdes dsk utlzaton threshold based on nlet temperatures of data nodes. Then, fles are assgned to dsks n each node provded that dsk utlzaton s below the correspondng threshold. We appled crossnterference coeffcents to estmate the recrculaton of hot ar from the outlets to the nlets of data nodes. We mplemented TIGER n an HP server. Our expermental results confrm that TIGER s capable of offerng about 10 to 15 percent coolng-energy savngs wthout sgnfcantly degradng I/O performance. ACKNOWLEDGMENT Ths work s supported by the U.S. Natonal Scence Foundaton under Grants CCF-0845257 CAREER), CNS-0917137 CSR), CNS-0757778 CSR), CCF-0742187 CPA), CNS-0831502 CyberTrust), CNS-0855251 CRI), OCI-0753305 CI-TEAM), DUE-0837341 CCLI), and DUE- 0830831 SFS). REFERENCES [1] Rn T. Kaushk and Klara Nahrstedt. T: a data-centrc coolng energy costs reducton approach for bg data analytcs cloud. In Proceedngs of the Internatonal Conference on Hgh Performance Computng, Networkng, Storage and Analyss, SC 12, pages 52:1 52:11, Los Alamtos, CA, USA, 2012. IEEE Computer Socety Press. [2] Y. Km, S. Gurumurth, and A. Svasubramanam. Understandng the performance-temperature nteractons n dsk /o of server workloads. In HPCA, pages 176 186, 2006. [3] Le L, Cheh-Jan Mke Lang, Je Lu, Suman Nath, Andreas Terzs, and Chrstos Faloutsos. Thermocast: A cyber-physcal forecastng model for data centers. In Proc. KDD, volume 11, 2011. [4] A. Manzanares, X. Qn, X. Ruan, and S. Yn. Pre-bud: Prefetchng for energy-effcent parallel /o systems wth buffer dsks. Trans. Storage, 71):3:1 3:29, June 2011. [5] J. Moore, J. Chase, P. Ranganathan, and R. Sharma. Makng schedulng cool : temperature-aware workload placement n data centers. In Proceedngs of the annual conference on USENIX Annual Techncal Conference, ATEC 05, pages 5 5, Berkeley, CA, USA, 2005. USENIX Assocaton. [6] L. Ramos and R. Banchn. C-oracle: Predctve thermal management for data centers. In Hgh Performance Computer Archtecture, 2008. HPCA 2008. IEEE 14th Internatonal Symposum on, pages 111 122, 2008. [7] Q. Tang, S. Gupta, and G. Varsamopoulos. Thermal-aware task schedulng for data centers through mnmzng heat recrculaton. In Cluster Computng, 2007 IEEE Internatonal Conference on, pages 129 138, sept. 2007. [8] Q. Tang, S.K.S. Gupta, and G. Varsamopoulos. Energy-effcent thermalaware task schedulng for homogeneous hgh-performance computng data centers: A cyber-physcal approach. Parallel and Dstrbuted Systems, IEEE Transactons on, 1911):1458 1472, nov. 2008. [9] Q. Tang, T. Mukherjee, S. K S Gupta, and P. Cayton. Sensor-based fast thermal evaluaton model for energy effcent hgh-performance datacenters. In Intellgent Sensng and Informaton Processng, 2006. ICISIP 2006. Fourth Internatonal Conference on, pages 203 208, 2006. [10] L. Wang, G. von Laszewsk, J. Dayal, X. He, A.J. Younge, and T.R. Furlan. Towards thermal aware workload schedulng n a data center. In Pervasve Systems, Algorthms, and Networks ISPAN), 2009 10th Internatonal Symposum on, pages 116 122, dec. 2009.