1 Send Orders for Reprints to 206 The Open Fuels & Energy Science Journal, 2015, 8, Open Access The Research of Measuring Approach and Energy Efficiency for Hadoop Periodic Jobs Yu Junyang 1,2, Hu Zhigang *,1 and Han Yuanyuan 3 1 Software School, Central South University, Changsha, Hunan , P.R. China 2 Software School, Henan University, Kaifeng, Henan, , P.R. China 3 Power Copany of Kaifeng, Kaifeng, Henan, , P.R. China Abstract: Current consuption of cloud coputing has attracted ore and ore attention of scholars. Hadoop as a cloud platfor, the research about its energy consuption has also received considerable attentions fro scholars. This paper presents a ethod to easure the energy consuption of jobs that run on Hadoop, and this ethod is used to easure the effectiveness of the ipleentation of periodic tasks on Hadoop platfor. Cobining with the current ainstrea about energy estiate forula to conduct further analysis, this paper has reached a conclusion about how to reduce energy consuption of Hadoop by adjusting the split size or using appropriate size of workers (servers). Finally, experients show the effectiveness of these ethods of energy-saving and verify the feasibility of the ethod for easureent of periodic tasks at the sae tie. Keywords: Energy efficiency, energy saving, Hadoop, easureent ethod. 1. INTRODUCTION Since the concept of cloud coputing has been prooted in 2006, it has developed rapidly and any copanies have launched their own cloud coputing platfor, such as Azure fro Microsoft, Google Copute Engine fro Google, IBM Blue Cloud Coputing Platfor fro IBM and Dynao fro Aazon . Although cloud coputing technology develops continuously, Hadoop platfor is the ost popular object that any scholars study. Hadoop coputing platfor is coposed of MapReduce coputing fraework and HDFS file storage syste, and current research about Hadoop [2, 3] focuses on perforance optiization but it s processing efficiency is not high for sall files [4, 5]. Fro the point of energy efficiency, this essay proposed a ethod to easure the effectiveness of periodic tasks that run on Hadoop platfor, and it also studied how to be energy efficiency though the analysis of easureent ethods. 2. RESEARCH STATUS The first step to study the energy efficiency of the cloud is to have evaluation criteria. The evaluation criteria of IDC consuption norally consist of three odes, including easures of the server, the overall energy consuption of the engine roo and the specific energy of the cluster The Energy Efficiency Standards Test of SPEC  SPEC (Standard Perforance Evaluation Corporation) is a global authority in application perforance testing organization whose ain coponents are world-renowned universities, research institutions and soe IT copanies. The ain task of the organization is to establish, aintain, and support a range of relevant bencharks set which are used to test server and related coponents. Currently testing standards for server power consuption is SPECpower_ssj 2008, and energy efficiency units are ssjops/watt, which eans the nuber of operations about executive standard. Sources as shown in Equation 1: Perforance to Power Ratio= ssj_ops / power (1) Perforance to Power Ratio eans the powerperforance ratio, naely energy efficiency. Σ ssj_ops represent standard operating unit tie perfored by the server. Σpower represents the power consued by the server in a period of tie. This ethod is often used to assess the situation on the energy efficiency of servers The Standard of Green Grid  Green Grid was founded in 2007, a third-party non-profit organization, which utility copanies, including AMD, HP, IBM and other copanies, collaborate to iprove the resource efficiency of data centers. The study of energy conducted by the organization ranges fro the location of the data center to the server architecture and the use of refrigeration equipent, etc, and research results are X/ Bentha Open
2 The Research of Measuring Approach and Energy Efficiency for Hadoop Periodic Jobs The Open Fuels & Energy Science Journal, 2015, Volue published in The Green Grid Connection. As a result, the energy efficiency of data centers is ore acroscopic, as forula 2,3: PUE = total facility power / IT equipent power (2) DCiE = IT equipent power consuption 100% / total power consuption (3) PUE eans Power Usage Effectiveness. DCiE eans Data Center Infrastructure Efficiency. By the estiate and copare the PUE, enterprises can help to optiize space utilization and iprove data center energy efficiency. The easureent criteria are ore coprehensive and closer to the data center energy efficiency assessent of the entire project The Standard of SONG  Forula 1 is standard for the estiation of single server energy efficiency. Forula 2 and 3 is standard for the estiation of the entire data center. A scholar naed Song Jie presented a easureent that based on MapReduce Cloud. In his ethod, SONG defined coputation processor at 1 GHz frequency 1 s copleted for 1 U, his expression of energy efficiency forula is illustrated in the forula (4) η(t ) = L(T ),E(T ) 0 (4) E(T ) η represents energy efficiency, L (T) represents the tie to coplete the task within the T (the unit is U). E (T) represents the energy consuption of tie T (the unit is Joules). According to the study of Elnozahy, when the coputer CPU is running at full capacity of frequency f, the power is P, and then the power P and frequency f satisfies the equation (5) P = P fix + P f f 3 (5) P fix is the power when the coputer is in the idle state. As can be seen fro the above, the study of IT equipent energy efficiency easure focused on issues such as the size of a single node, or the entire data center. The energy efficiency easure for Hadoop platfor lacks research. Hadoop platfor has its own peculiarities; therefore, its energy efficiency easure has its own characteristics. The research result of SONG copensate for the energy efficiency easure about Hadoop platfor to soe extent. To calculate the copetency-based etrics can effectively copare different energy efficiency differences between cloud platfors. However, in actual operation, the workers in Hadoop platfor have heterogeneity. There will always be periodic task to run on (search, log analysis, etc.) on one Hadoop platfor, and the efficiency is not the sae by using different algoriths to the sae proble. This essay fro another angle, easuring different algoriths to solve the sae proble on the sae Hadoop platfor, studies the energy-saving proble. 3. MEASURES ABOUT PERIODIC TASK ENERGY EFFICIENCY Cloud platfor is isoorphic in Song s  study, and his easure can judge the efficiency of different cloud platfor. However, in practice, for progras run on the sae cloud platfor it is unable to assess the progras energy efficiency especially when the progras achieve the sae goal. This essay studies easures of algoriths on the sae cloud platfor, the significance lies in how to iprove the algorith to ake it ore energy efficiency. Hadoop platfor often perfor any periodic tasks, therefore, it is ore eaningful to study how to be ore efficiency by adjusting easures The Deduction of Measure Methods We assue that in unit tie, Hadoop clusters electricity consued E, the nuber of tasks copleted per unit tie is n, this job s energy efficiency is: µ= E n That eans the efficiency of the jobs run on the Hadoop platfor is the ratio of electricity consued per unit tie and the period of tie to coplete the nuber of jobs. The following is a further derivation of the forula (6) µ= E n = E t n t = P f Forula 7 eans the job s efficiency on the Hadoop platfor is the ratio of the average power of its running tie and the frequency of the task. The unit of P is W (watts), the frequency f is expressed in Hz (hertz), the energy efficiency of the unit is W / Hz. W / Hz is the job efficiency on Hadoop platfor. There are two reasons to use equation 6 to explain: Firstly, Hadoop platfor does not shut down after finishing the first job in practice, therefore, the idle tie of job interval should be taken into account. The idle tie would be ore if the speed of task execution increases. Secondly, any tasks that Hadoop platfor perfors, such as log analysis or searching, are cyclical, which are based on reality Correlation Analysis of Energy-Saving The research about what is related to task efficiency, how to iprove algoriths energy efficiency and reduce energy consuption. Assuing that Algorith1 and Algorith2 can finish the sae job, in the sae Hadoop cluster in less than an hour to coplete these operation n ties. Algorith 1 coplete a task within tie t 1 and Algorith2 coplete a task within tie t 2. Suppose t 1 t 2, fro this sense, Algorith 1 is ore efficient than Algorith 2, but how to define the relationship between the? Assuing P i is the power consuption when Hadoop cluster is idle, P 1 is the average power when Algorith1 perfor tasks, P 2 is the average power when Algorith 2 perfor tasks. The energy (6) (7)
3 208 The Open Fuels & Energy Science Journal, 2015, Volue 8 Junyang et al. consuption of Algorith1 and Algorith2 within an hour is: E 1 = n* t 1 * P 1 +(3600- n* t 1 )* P i (8) E 2 = n* t 2 * P 2 +(3600- n* t 2 )* P i (9) Here is further analysis about E1 and E2 to deduce energy efficiency and the relationship between energy efficiency and the average power of algorith. E 1 -E 2 = n(t 1 P 1 - t 2 P 2 )+n P i (t 2 - t 1 ) (10) According to equation (10), the coparison about algorith efficiency can be transferred to the coparison between (t 1 P 1 - t 2 P 2 )and P i (t 2 - t 1 ). Fro the angle of physics, (t 1 P 1 - t 2 P 2 )represents the difference between the copletion of a task Algorith1 and energy consuption of Algorith2, (t 1 P 1 - t 2 P 2 ) represents the difference between the energy consuption in the ipleentation of idle tie on Algorith1 and Algorith 2 Hadoop cluster. In other word, the idle tie would be longer if shortening the tie to coplete the task (such as periodic tasks, searching and sorting). Therefore, energy saving should be the difference between the task tie and idle tie energy consuption as a whole. Generally, it is difficult to continue to deterine the relationship between energy consuption and tie of the algorith. The tie to coplete the task would be reduced because of high efficiency, but the idle tie will be longer. In order to ake ore accurate judgent, we introduce FAN  odel, which will link power P of the server and the utilization of CPU. P pred = C 0 + C 1 * U cpu (11) C 0 is a constant and is the power when the server is idle, and C 1 is the difference between the server and the server at the power peak power in the idle state. There is another linear odel based on (11) with a calibration ite. However, forula (11) is recognized as the foundation with high degree of accuracy. We use forula (11) as a basis for research, assuing Algorith 1 and Algorith 2 running on cluster that consist of servers, and then the variable of forula (10) can be decopose into forula (12-14). The job runs in soe nodes, turning down the unnecessary nodes is a way to anage energy, and energy efficiency anner is necessary to refine the algorith to each server. We have coparison about the energy consuption of Algorith 1 and Algorith 2 per second. P i = p ni (12) P 1 = p n1 (13) P 2 = p (14) n2 For server A, the energy consuption of Algorith1 and Algorith2 is E a1 and E a2 respectively. E a1 -E a2 = f(t 1 P a1 - t 2 P a2 )+ fp ai (t 2 - t 1 ) (15) This step is the sae as equation (10), but equation (10) reflects the condition of the whole cluster. Equation (15) is specific to a server. The question about server A that processes tasks under two algoriths is whether efficient or not becoes the coparison between (t 1 P a1 - t 2 P a2 ) and P ai (t 2 - t 1 ). Fro these two equations, it is ore energy efficiency when the power of server is lower and the tie is shorter under the execution. Suppose, the average utilization of CPU is α when using Algorith1 processing tasks, and the average utilization of CPU is β when using Algorith2 processing tasks. Server idle power consuption is about the energy consuption of its full load operation 70%, that is P ai = 0.7 P aax. Cobining equation (11), there is P a1 = P ai + P a α (16) P a2 = P ai + P a β (17) Bringing equation (16) (17) into Equation (15) E a1 -E a2 =f((t 1 P a1 - t 2 P a2 )+ P ai (t 2 - t 1 ))=f P a (α t 1 - β t 2 ) (18) For server A, f P a is a constant, so when is the ost energy-efficient is that (α t 1 - β t 2 ). Assuing R1 (MI) and R2 (MI) are calculated when the aount of server to perfor task utilizing Algorith 1 and Algorith 2 (When the nuber of nodes to perfor jobs and the scheduling algorith is not change, when coplete a job, nodes that share coputational tasks barely changed) (MIPS)is coputing power of CPU for server a, and then it is concluded that: α = R 1 t 1 β = R 2 t 2 Cobining (18) and (19)(20) Ea1 -Ea2 = f Pa( R 1 R 2 (19) (20) ) (21) Because f, P a and are constants in equation (21), that is believed that coputational algorith directly deterines its energy efficiency for servers. For Hadoop cluster, the ost iportant way to iprove energy efficiency is to reduce the aount of calculations. However, there are three ways when the nuber of tasks is unchanged. (1) Maintaining appropriate node size, such as turning off unnecessary nodes. (2) The energy consuption is the lowest when nodes are load balancing for isoorphic server cluster. (3) Reducing the aount of calculation by adjusting algorith for nonisoorphic server cluster. The first ethod is the ost coonly used ethod, but the use of ethod 1 ust be deployed ore copies and using a special replica placeent strategy. This paper focuses on is how to reduce the aount of calculation task in the syste. 4. THE RESEARCH OF ENERGY EFFICIENCY Fro the above derivation, the energy efficiency of different algoriths on this platfor can be well perfored
4 The Research of Measuring Approach and Energy Efficiency for Hadoop Periodic Jobs The Open Fuels & Energy Science Journal, 2015, Volue in equation 2, and the ost iportant factor about the energy efficiency of the algorith is to coplete calculation of the task. Energy-saving question then becoes to reduce the coputational probles. There are three ways to reduce calculation by analysis of the principles about Hadoop platfor. (1) Iproving algoriths and optiizing it. (2) According to Hadoop fraework, optiizing processing tasks link for MapReduce and reduce the occupancy for Hadoop. The first is coonly used, but when the algorith can not be optiized when and how to be energy-efficient in coputing fraework -- we can analyze the MapReduce fraework. Input Map tasks Reduce tasks output Table 1. Cluster configuration. Hadoop-A Switch D-link DES-3028 Hadoop0 cpu: 2.00G (2) eory: 1.97G Hadoop1 cpu: 2.50G (2) eory: 1.96G Hadoop2 cpu: 2.50G (2) eory: 2.95G Hadoop3 cpu: 2.30G (3) eory: 1.84G Hadoop4 cpu: 1.60G (2) eory: 1.97G Hadoop-B Switch D-link DES-3028 Hadoop0 cpu: 2.00G (2) eory: 1.97G Hadoop1 cpu: 2.50G (2) eory: 1.96G Hadoop2 cpu: 2.50G (2) eory: 2.95G split0 split1 split2 splitn shuffle Fig. (1) Work flow chart of MapReduce. Outputfile 0 Outputfile 1 Outputfile 2 When the job scheduler in MapReduce is working, it will establish a Map for each file and assign tasks to Task Tracker. Fro this process it can be seen that a file corresponds to a Map, (Each task operation of Map includes production, scheduling and end tie a large nuber of Map tasks will cause soe perforance loss) but a Map would start a thread. The effective utilization of the resources is lower if there are ore threads, because the threads will occupy ore resources. As a result of it, doing erge pretreatent for sall files fro the analysis about working principles of MapReduce, and it can reduce energy consuption and iprove energy efficiency in theory. MapReduce workflow has been shown in Fig. (1); Each file corresponds to a split while Hadoop default data block size is 64 M. In reality, there are any files stored below this block size, soe sall files in particular. If erge processing sall files, it can reduce the nuber of Map, reducing the nuber of processes in the syste. In the process of reduction, the nuber of erger would be reduced. We can confir the efficiency and its effects by section 5 of the experient. 5. EXPERIMENTAL COMPARISON The experient has two purposes, the first is to verify the effectiveness of energy efficiency etrics in section 4, and the second is to verify the data preprocessing for crushing energy savings are correct in section 5. Experiental environent is Hadoop2.2.0, 5 sets of heterogeneous PC, hadoop0 is the node of Master, the reaining four are nodes for Slave, this cluster naed Hadoop-A. Used hadoop0 as node for Master, Hadoop1 and hadoop2 are nodes for Slave, which is called cluster Hadoop-B Data Merge Saving This essay used benchark progra wordcount in Hadoop as an exaple, randoly selecting 400 articles and then testing, each article as a file (less than 2 M). Within one hour of the Hadoop-A cluster do wordcount 4 ties, after that every four articles erge into one (Reduce syste footprint) and still do wordcount 4 ties, and then aking energy coparison. The following Figs. (2, 3) is is a coparison of CPU and eory before and after the erger do wordcount, CPU and MEMORY are the two ost iportant parts in a coputer: Fig. (2). The change of entire cluster CPU utilization. After erging, although the CPU and Meory utilization at peak higher than the before, it is uch faster to process data. As shown in Table 2, the energy efficiency iproved 14.63% after cobination, which confired the saving ethod proposed in section Energy Efficiency Metrics In Table 3, energy efficiency is calculated by Equation 1 and different scale Hadoop cluster on the sae batch of data
5 210 The Open Fuels & Energy Science Journal, 2015, Volue 8 Junyang et al. before and after the erger when wordcount. Hadoop-A and Hadoop-B represent different Hadoop clusters, and Hadoop- B cluster is a sall cluster consists of three nodes. q And h represents data processing without erge and data processing with erge respectively. Discriination is quite obvious whether it is different algoriths on sae platfor or different platfors on sae algorith. Therefore, equation1 can not only easure energy efficiency of a task on different Hadoop platfor, but also easure the energy efficiency of different algoriths to achieve the sae goal. Forula 1 is a ore reasonable way to easure. CONCLUSION For cloud coputing clusters, this essay proposed a very effective way to easure, which significantly distinguishes the relationship of energy efficiency in different ways to perfor the sae task (algorith) on the sae platfor. The effective way to save energy those erger sall files on Hadoop and this ethod has been proved to be efficient by experients. The next step to study is how to divide the data blocks achieve optial energy efficiency. CONFLICT OF INTEREST The authors confir that this article content has no conflict of interest. Fig. (3). The change of entire cluster Meory utilization. Table 2. Table 3. The energy consuption before and after the file erger (one hour). Before files erger After files erger Energy of different conditions. Energy 0.41 kwh 0.35 kwh Energy Efficiency (W/Hz) HadoopA-q HadoopA-h HadoopB-q HadoopB-h As shown in equation1, different ethods (algoriths) to achieve the sae purpose on unified platfor can be copared, and energy efficiency using the sae algorith on different Hadoop platfor can be easured. ACKNOWLEDGEMENTS This work was supported by a Grant fro the National Natural Science Foundation (Nos., ) and Science and Technology Key Project fro Henan Province Departent of Education (12B520007). Also, the authors greatly appreciate the reviewers valuable coents on this paper. REFERENCES  DeCandia, G.; Hastorun, D.; Japani, M.; Kakulapati G.; Lakshan A.; Pilchin A.; Sivasubraanian S.; Vosshall P.; Vogels W. Dynao: Aazon's Highly Available Key-Value Store, ACM Syposiu on Operating Systes Principles, Proceedings of 21 st ACM SIGOPS Syposiu on Operating Systes Principles, 2007, 14(17),  Hadoop [EB/OL].  White T. Hadoop: The Definitive Guide. O'Reilly, 2012, pp  Sall files proble [EB/OL]. 02/the-sall-files-proble/  Cheng, X.; Xiao, N.; Huang, F. Research on HDFS Based Web Server Cluster, International Conference on E-Business and E-Governent (ICEE), Shanghai China: IEEE, 2011, pp    Song, J.; Li, T.T.; Yan, Z.X.; Na, J.; Zhu, Z.L. Energy-efficiency odel and easuring approach for cloud coputing. J. Softw., 2012, 23(2),  Fan, X.; Weber, W.D.; Barroso, L.A. Power provisioning for a warehouse-sized coputer. ACM SIGARCH Cop. Arch. News, 2007, 35(2), Received: April 17, 2015 Revised: May 21, 2015 Accepted: May 26, 2015 Junyang et al.; Licensee Bentha Open. This is an open access article licensed under the ters of the Creative Coons Attribution Non-Coercial License (http://creativecoons.org/licenses/by-nc/3.0/) which perits unrestricted, non-coercial use, distribution and reproduction in any ediu, provided the work is properly cited.