Green Master based on MapReduce Cluster

Gree Master based o MapReduce Cluster Mg-Zh Wu, Yu-Chag L, We-Tsog Lee, Yu-Su L, Fog-Hao Lu Dept of Electrcal Egeerg Tamkag Uversty, Tawa, ROC Dept of Electrcal Egeerg Tamkag Uversty, Tawa, ROC Dept of Electrcal Egeerg Tamkag Uversty, Tawa, ROC Chug-Sha Isttute of Scece ad Techology, Tawa, R.O.C Natoal Defese Uversty, Tape Cty, Tawa, R.O.C dea64188@hotmal.com, bearlaker34@gmal.com wtlee@mal.tku.edu.tw, ama.ama@mal.tbcet.et, lfh123@gmal.com Abstract MapReduce s a kd of dstrbuted computg system, ad also may people use t owadays. I ths paper, the Gree Master based o MapReduce s proposed to solve the problem betwee load balace ad power savg. There are three mechasm proposed by ths paper to mprove the MapReduce system effcecy. Frst, a brad ew archtecture called Gree Master s desged the system. Secod, Bechmark Score s added to each servces the cluster. I the last, a algorthm about how to dstgush the hgh score servce ad the low score servce, ad how to use them effectvely. Keywords MapReduce, Bechmark, Cloud Network 1 Itroducto The algorthm ths paper wll be used to mprove the system effcecy based o MapReduce[1] of Hadoop. Hadoop s a kd of ope source software that develop from Google MapReduce, ad t ca wll create a cluster that coects each servces. The cluster s used to make more computg resources called computg pool, ad t ca be expaded more ad more. I the ed, we ca decde what we wat to get or how to execute the program through codg the Map Fucto ad Reduce Fucto. As usual, order to make the maxmum computg resources, the servces must keep the hgh-speed state, but t also has a lot of uecessary waste. For example, servce performace usually are ot the same to each other, some of them are very hgh, but some of them are very low. f we allocate the same amout of work to all servce, t must cause a part of servce wll complete the work early, but t stll have to wat other servce that performace s poor, ad the watg tme meas resources wastes. We wll talk about how to make the servce off f the performace s too low that serously affects the system performace.

2 Related works 2.1 Master of MapReduce Master of MapReduce Master Node s the most mportat ode o MapReduce whch caot be replaced by other odes. It cludes map fucto, reduce fucto ad mapreduce rutme system. Master ode maages recevg commad from user ad assgg tasks to task trackers, ad t stores status of task trackers database. The status s verfed three dfferet types: Idel, I-processg ad completed. The memory address ad sze of processg data HDFS(GFS Google, HDFS Hadoop) are otfed to Master ode, ad assg map fucto ad task tracker to complete the task.. Fg. 1. MapReduce Archtecture 2.2 Bechmark Bechmark[2], geerally speakg, s a value about somethg 's performace or ablty ad make comparso. However, a performace comparso of vrtualzato techology for the momet s ot very commo, VM Bechmark s a ew type of test methods. It s dscussed vrtual evromet buld through vrtualzato ad vrtual mache maagemet VM resources (hard dscs, memory). We have adopted Vrtual Mache system buld, ad we troduce the mechasm of the Bechmark to dstgush the VMs' performace. 3 Implemetato I ths secto, the algorthm of Gree Master wll be explaed how to mplemet. It cludes Gree Master System, Iput Fle Idex, Server Iformato, Queue, Record, Load Balace Optmzato, Power Savg Algorthm, ad Decso Algorthm. Ad we wll dscuss the detal at the followg.

3.1 Gree Master System Fg. 2. Gree Master Archtecture The Gree Master s a brad ew archtecture trasformed from Hadoop's Master, ad t ca apply to each odes that stall the Hadoop. The brad ew archtecture called Gree MapReduce System(GMS), ad t ca help users maage the ode the cluster to save the system cosumpto ad servce computg overhead. The Gree Master does ot chage the Map Fucto ad Reduce Fucto, t just chages the task allocato master accordg to server loadg ad server's Bechmark Score to acheve the goal about the eergy savg. It s ot accepted that the system performace reduces caused by someoe vrtual mache low effcecy, especally the Cloud Computg Network evromet. It s ot accepted that the system performace reduces caused by someoe vrtual mache low effcecy, especally the Cloud Computg Network evromet. I order to solve the above problems, Gree Master s desged to delete the poor servces ad allocate the job dstrbuto. Gree Master s dvdes to eght blocks, ad t cludes Iput Fle Idex, Queue, Server Iformato, Record, Load Balace Optmzato, Power Savg Algorthm ad Decso Algorthm. Gree Master has a strog adaptablty to may systems, for a stace, whe we eed a great amout of computg resources to calculate tasks, we ca use Gree Master to avod eergy wastes. For aother stace, whe the system equpmet has a strog ocoformace, ad the system ca use the Bechmark Score the Gree Master to arrage the tasks allocato accordg to the servces capablty. 3.2 Server Iformato The Server Iformato the Gree Master s to estmate the servces' capablty called Bechmark Score, ad t wll keep rug ad sed the results to Gree Master. I addto, wheever a ew server jo or qut the cluster, Bechmark Score wll

chage. The rage of the Bechmark Score s from zero to oe hudred, ad t s accordg to CPU computg performace, Memory read/wrte ad Dsk I/O rate to estmate the Bechmark Score. I other had, the hghest CPU respose tme, Memory read/wrte ad Dsk I/O s defed as 100 Bechmark Score. The defto of poorer vrtual maches' Bechmark Score are based o the hghest oe. X BechmarkScore 100% (3.1) Top where the Top s the hghest value of the vrtual mache, ad the x s the value of the vrtual mache lke CPU respose tme to be measured. Because of the CPU respose tme, Memory read/wrte ad Dsk I/O rate have to be cosdered the formula, so we tur formulas evoluto as follows: X Bechmark Score W, W { Measured Evet}, {1,2,..., m} (3.2) Top VM where the W s the evet of Bechmark Score. I our case, the of W s three, there are CPU respose tme, Memory read/wrte ad Dsk I/O rate respectvely. VM 3.3 Recorder Recorder s used for recordg server formato. Recorder refresh whe t receves ewer server formato. A ew recordg table s establshed for formato record whe there s ew ode jos to the cluster. Servers update ad refresh server formato recorder durg the workg tme. 3.4 Load Balace Optmzato Load Balace[3] Optmzato wll allocate the work loadg accordg to the formato collectg from the above-metoed blocks. The Bechmark Score s more hgher, ad the work loadg s more; the Bechmark s lower, ad the work loadg s less. The job s allocated to VMs through Load Balace Optmzato, ad the formula s followg: Task Dstrbuto Rato LocalVM 100% Score 1 (3.3) where Total Score s the sum of the VMs' Bechmark Score, ad the Local Score s the VM's Bechmark what you wat to estmate. I our expermet, we use sx VM the expermet evromet ad calculate the work loadg rato as followg:

3.5 Power Savg Algorthm I ths paper, Power Savg Algorthm (PSA)[4][5] wll check the utlzato of the server. I the frst state, we allocate the work loadg to VM accordg to the Bechmark Score, the the secod state, we wll determe the utlzato of the VM. I Fgure 4, we ca fd that the huge dfferece of the work loadg betwee Bechmark Score 100 ad Bechmark Score 5, but they use almost same eergy. Ths paper presets PSA to dscuss how to get the balace betwee effcecy ad eergy maagemet. a 1 Bj j 1 T B (3.4) E T N P, N {1,2,..., } (3.5) VM VM VM V E T 1 1 E T, 2 (3.6) where T s system computg tme, ad a s the system tme whch oe vrtual mache completed aloe, ad the B s the Bechmark Score of oe vrtual mache, ad the s the error tme. E s the eergy(j) of vrtual mache. P s the power(w) of vrtual mache. V s the rato of eergy cosumpto. 3.6 Decso Algorthm Decso Algorthm[6] GMS s to judge the result whch s from PSA reasoable or ot. The formula s as followg: (3.7) where α s the system cosumpto through PSA, ad s wthout PSA. If s greater tha α, the the system wll back to Load Balace Optmzato.

4 Smulato Result Fg. 3. Expermet Evromet Fgure 4 shows the hghest performace vrtual mache the expermet evromet of ths paper. 4.1 The Relatoshp betwee system computg tme ad system cosumpto 600 500 400 300 200 100 0 1 2 3 4 5 6 24000 19000 14000 9000 4000-1000 System Tme(s) System Cosumpto (J) Num of VM. Fg. 4. System Tme ad Cosumpto I Fgure 5, we ca fd that the cross pot betwee the system tme ad system cosumpto s betwee two VMs ad three VMs. I fact, the umber of VM of the best performace our expermet s three VMs. 4.2 Comparso betwee Orgal ad Gree Master

Pwer Savg(%) System Cosumpto(J) Tme of Computg System(S) 500 400 300 200 100 Orgal(s) Gree Master(s) 0 Test1 Test2 Test3 Fg. 5. System Computg Tme 50000 40000 30000 Orgal(J) 20000 10000 0 Test 1 Test 2 Test 3 Gree Master(J) Fg. 6. Power Cosumpto Savg 0.44 0.438 0.436 0.434 0.432 0.43 0.428 Test Test Test 1 2 3 Power Savg(%) 0.435 0.44 0.433 Fg. 7. Rato of Power Savg

I Fgures 6 ad 7, we take several dfferet szes of test fle our expermet evromet, we ca clearly fd the orgal system tme s less tha Gree Master, but system cosumpto s almost twce larger tha Gree Master. 5 Cocluso The dea of Gree Master optmzes system power cosumpto by lower the performace slghtly. I ths paper, we provde a approprate trade-off betwee power savg ad performace loses, ad mproves eergy coservato of the system. Refereces [1] Jeffrey Dea ad Sajay Ghemawat, MapReduce: Smpl_ed Data Processg o Large Clusters, OSDO 2004. [2] Xa Xe, Qgcha Che, Wezh Cao, Pgpeg Yua, Ha J, Bechmark Object for Vrtual Maches, 2010 Secod Iteratoal Workshop o Educato Techology ad Computer Scece. [3] Lars Kolb, Adreas Thor, Erhard Rahm, Load Balacg for MapReduce-based Etty Resoluto, 2012 IEEE 28th Iteratoal Coferece o Data Egeerg. [4] Tao Zhu1,2, Chegchu Shu1, Haya Yu1, Gree Schedulg: A Schedulg Polcy for Improvg the Eergy Effcecy of Far Scheduler, 2011 12th Iteratoal Coferece o Parallel ad Dstrbuted Computg, Applcatos ad Techologes. [5] Sadhya S V, Sajay H A*, Netravath S J, Sowmyashree M V, Yogeshwar R N, Fault Tolerat Master-Workers framework for MapReduce Applcatos, 2009 Iteratoal Coferece o Advaces Recet Techologes Commucato ad Computg. [6] Cogchog Lu ad Shuja Zhou, Local ad Global Optmzato of MapReduce Program Model, 2011 IEEE World Cogress o Servces. [7] NovQlu He,Zhahua L Xao Zhag, Study o Cloud Storage System based o Dstrbuted Storage Systems, Computatoal ad Iformato Sceces( ICCIS), 17-19 Dec 2010, pp. 1332-1335 [8] Kev D.Bowers,Ar Juels, ad Ala Oprea., HAIL: A HghAvalablty ad Itegrty Layer for Cloud Storage, Computer ad commucatos securty (CCS), Nov 2009, pp. 187-198 [9] Mgyue Luo,Gag Lu, Dstrbuted log formato processg wth Map-Reduce: A case study from raw data to fal models, Iformato Theory ad Iformato Securty(ICITIS), 17-19 Dec. 2010, pp.1143-1146