Green Master based on MapReduce Cluster

Similar documents
The impact of service-oriented architecture on the scheduling algorithm in cloud computing

Applications of Support Vector Machine Based on Boolean Kernel to Spam Filtering

A Parallel Transmission Remote Backup System

Load Balancing Algorithm based Virtual Machine Dynamic Migration Scheme for Datacenter Application with Optical Networks

Maintenance Scheduling of Distribution System with Optimal Economy and Reliability

A New Bayesian Network Method for Computing Bottom Event's Structural Importance Degree using Jointree

Research on Cloud Computing and Its Application in Big Data Processing of Railway Passenger Flow

IDENTIFICATION OF THE DYNAMICS OF THE GOOGLE S RANKING ALGORITHM. A. Khaki Sedigh, Mehdi Roudaki

Security Analysis of RAPP: An RFID Authentication Protocol based on Permutation

Dynamic Provisioning Modeling for Virtualized Multi-tier Applications in Cloud Data Center

RESEARCH ON PERFORMANCE MODELING OF TRANSACTIONAL CLOUD APPLICATIONS

Study on prediction of network security situation based on fuzzy neutral network

Optimal Packetization Interval for VoIP Applications Over IEEE Networks

Agent-based modeling and simulation of multiproject

Fractal-Structured Karatsuba`s Algorithm for Binary Field Multiplication: FK

A DISTRIBUTED REPUTATION BROKER FRAMEWORK FOR WEB SERVICE APPLICATIONS

Optimal multi-degree reduction of Bézier curves with constraints of endpoints continuity

A Novel Resource Pricing Mechanism based on Multi-Player Gaming Model in Cloud Environments

How To Balance Load On A Weght-Based Metadata Server Cluster

Cyber Journals: Multidisciplinary Journals in Science and Technology, Journal of Selected Areas in Telecommunications (JSAT), January Edition, 2011

AN ALGORITHM ABOUT PARTNER SELECTION PROBLEM ON CLOUD SERVICE PROVIDER BASED ON GENETIC

SHAPIRO-WILK TEST FOR NORMALITY WITH KNOWN MEAN

TESTING AND SECURITY IN DISTRIBUTED ECONOMETRIC APPLICATIONS REENGINEERING VIA SOFTWARE EVOLUTION

Algorithm Optimization of Resources Scheduling Based on Cloud Computing

Integrating Production Scheduling and Maintenance: Practical Implications

An IG-RS-SVM classifier for analyzing reviews of E-commerce product

Dynamic Two-phase Truncated Rayleigh Model for Release Date Prediction of Software

Numerical Methods with MS Excel

VIDEO REPLICA PLACEMENT STRATEGY FOR STORAGE CLOUD-BASED CDN

Discrete-Event Simulation of Network Systems Using Distributed Object Computing

Report 52 Fixed Maturity EUR Industrial Bond Funds

Fault Tree Analysis of Software Reliability Allocation

Efficient Traceback of DoS Attacks using Small Worlds in MANET

A Security-Oriented Task Scheduler for Heterogeneous Distributed Systems

An Effectiveness of Integrated Portfolio in Bancassurance

An Approach to Evaluating the Computer Network Security with Hesitant Fuzzy Information

Models for Selecting an ERP System with Intuitionistic Trapezoidal Fuzzy Information

A Study of Unrelated Parallel-Machine Scheduling with Deteriorating Maintenance Activities to Minimize the Total Completion Time

APPENDIX III THE ENVELOPE PROPERTY

Dynamic Service and Data Migration in the Clouds

Approximation Algorithms for Scheduling with Rejection on Two Unrelated Parallel Machines

AnySee: Peer-to-Peer Live Streaming

6.7 Network analysis Introduction. References - Network analysis. Topological analysis

Banking (Early Repayment of Housing Loans) Order,

Modeling of Router-based Request Redirection for Content Distribution Network

IP Network Topology Link Prediction Based on Improved Local Information Similarity Algorithm

Classic Problems at a Glance using the TVM Solver

A particle swarm optimization to vehicle routing problem with fuzzy demands

Automated Event Registration System in Corporation

ECONOMIC CHOICE OF OPTIMUM FEEDER CABLE CONSIDERING RISK ANALYSIS. University of Brasilia (UnB) and The Brazilian Regulatory Agency (ANEEL), Brazil

10.5 Future Value and Present Value of a General Annuity Due

The Digital Signature Scheme MQQ-SIG

Optimal replacement and overhaul decisions with imperfect maintenance and warranty contracts

Proactive Detection of DDoS Attacks Utilizing k-nn Classifier in an Anti-DDos Framework

On formula to compute primes and the n th prime

Fast, Secure Encryption for Indexing in a Column-Oriented DBMS

Application of Grey Relational Analysis in Computer Communication

Software Reliability Index Reasonable Allocation Based on UML

RQM: A new rate-based active queue management algorithm

Statistical Pattern Recognition (CE-725) Department of Computer Engineering Sharif University of Technology

ADAPTATION OF SHAPIRO-WILK TEST TO THE CASE OF KNOWN MEAN

A particle Swarm Optimization-based Framework for Agile Software Effort Estimation

How To Make A Supply Chain System Work

Research on Matching Degree of Resources and Capabilities of Enterprise Transformation Based on the Spatial Points Distance

Commercial Pension Insurance Program Design and Estimated of Tax Incentives---- Based on Analysis of Enterprise Annuity Tax Incentives

FINANCIAL MATHEMATICS 12 MARCH 2014

ANOVA Notes Page 1. Analysis of Variance for a One-Way Classification of Data

Projection model for Computer Network Security Evaluation with interval-valued intuitionistic fuzzy information. Qingxiang Li

Using Phase Swapping to Solve Load Phase Balancing by ADSCHNN in LV Distribution Network

1. The Time Value of Money

The analysis of annuities relies on the formula for geometric sums: r k = rn+1 1 r 1. (2.1) k=0

A Smart Machine Vision System for PCB Inspection

Performance Attribution. Methodology Overview

of the relationship between time and the value of money.

Chapter 3. AMORTIZATION OF LOAN. SINKING FUNDS R =

The Analysis of Development of Insurance Contract Premiums of General Liability Insurance in the Business Insurance Risk

Learning to Filter Spam A Comparison of a Naive Bayesian and a Memory-Based Approach 1

The Gompertz-Makeham distribution. Fredrik Norström. Supervisor: Yuri Belyaev

DECISION MAKING WITH THE OWA OPERATOR IN SPORT MANAGEMENT

Preprocess a planar map S. Given a query point p, report the face of S containing p. Goal: O(n)-size data structure that enables O(log n) query time.

In-Network Management. Rolf Stadler. Stockholm, Sweden

Real-Time Scheduling Models: an Experimental Approach

Resource Management Model of Data Storage Systems Oriented on Cloud Computing

USEFULNESS OF BOOTSTRAPPING IN PORTFOLIO MANAGEMENT

Optimizing Software Effort Estimation Models Using Firefly Algorithm

Load and Resistance Factor Design (LRFD)

Towards Network-Aware Composition of Big Data Services in the Cloud

An Interrupt Scheduling Mechanism of Virtual Network Interface Card for Xen-Based System Liuzheng Lu 1, a, Runze Wan 2,1, b * and Jinrong He 3, c

Web Service Composition Optimization Based on Improved Artificial Bee Colony Algorithm

Research on the Evaluation of Information Security Management under Intuitionisitc Fuzzy Environment

Compressive Sensing over Strongly Connected Digraph and Its Application in Traffic Monitoring

Average Price Ratios

Chapter Eight. f : R R

Impact of Interference on the GPRS Multislot Link Level Performance

CHAPTER 2. Time Value of Money 6-1

Optimization Model in Human Resource Management for Job Allocation in ICT Project

IT & C Projects Duration Assessment Based on Audit and Software Reengineering

10/19/2011. Financial Mathematics. Lecture 24 Annuities. Ana NoraEvans 403 Kerchof

A Single Machine Scheduling with Periodic Maintenance

ANALYTICAL MODEL FOR TCP FILE TRANSFERS OVER UMTS. Janne Peisa Ericsson Research Jorvas, Finland. Michael Meyer Ericsson Research, Germany

Transcription:

Gree Master based o MapReduce Cluster Mg-Zh Wu, Yu-Chag L, We-Tsog Lee, Yu-Su L, Fog-Hao Lu Dept of Electrcal Egeerg Tamkag Uversty, Tawa, ROC Dept of Electrcal Egeerg Tamkag Uversty, Tawa, ROC Dept of Electrcal Egeerg Tamkag Uversty, Tawa, ROC Chug-Sha Isttute of Scece ad Techology, Tawa, R.O.C Natoal Defese Uversty, Tape Cty, Tawa, R.O.C dea64188@hotmal.com, bearlaker34@gmal.com wtlee@mal.tku.edu.tw, ama.ama@mal.tbcet.et, lfh123@gmal.com Abstract MapReduce s a kd of dstrbuted computg system, ad also may people use t owadays. I ths paper, the Gree Master based o MapReduce s proposed to solve the problem betwee load balace ad power savg. There are three mechasm proposed by ths paper to mprove the MapReduce system effcecy. Frst, a brad ew archtecture called Gree Master s desged the system. Secod, Bechmark Score s added to each servces the cluster. I the last, a algorthm about how to dstgush the hgh score servce ad the low score servce, ad how to use them effectvely. Keywords MapReduce, Bechmark, Cloud Network 1 Itroducto The algorthm ths paper wll be used to mprove the system effcecy based o MapReduce[1] of Hadoop. Hadoop s a kd of ope source software that develop from Google MapReduce, ad t ca wll create a cluster that coects each servces. The cluster s used to make more computg resources called computg pool, ad t ca be expaded more ad more. I the ed, we ca decde what we wat to get or how to execute the program through codg the Map Fucto ad Reduce Fucto. As usual, order to make the maxmum computg resources, the servces must keep the hgh-speed state, but t also has a lot of uecessary waste. For example, servce performace usually are ot the same to each other, some of them are very hgh, but some of them are very low. f we allocate the same amout of work to all servce, t must cause a part of servce wll complete the work early, but t stll have to wat other servce that performace s poor, ad the watg tme meas resources wastes. We wll talk about how to make the servce off f the performace s too low that serously affects the system performace.

2 Related works 2.1 Master of MapReduce Master of MapReduce Master Node s the most mportat ode o MapReduce whch caot be replaced by other odes. It cludes map fucto, reduce fucto ad mapreduce rutme system. Master ode maages recevg commad from user ad assgg tasks to task trackers, ad t stores status of task trackers database. The status s verfed three dfferet types: Idel, I-processg ad completed. The memory address ad sze of processg data HDFS(GFS Google, HDFS Hadoop) are otfed to Master ode, ad assg map fucto ad task tracker to complete the task.. Fg. 1. MapReduce Archtecture 2.2 Bechmark Bechmark[2], geerally speakg, s a value about somethg 's performace or ablty ad make comparso. However, a performace comparso of vrtualzato techology for the momet s ot very commo, VM Bechmark s a ew type of test methods. It s dscussed vrtual evromet buld through vrtualzato ad vrtual mache maagemet VM resources (hard dscs, memory). We have adopted Vrtual Mache system buld, ad we troduce the mechasm of the Bechmark to dstgush the VMs' performace. 3 Implemetato I ths secto, the algorthm of Gree Master wll be explaed how to mplemet. It cludes Gree Master System, Iput Fle Idex, Server Iformato, Queue, Record, Load Balace Optmzato, Power Savg Algorthm, ad Decso Algorthm. Ad we wll dscuss the detal at the followg.

3.1 Gree Master System Fg. 2. Gree Master Archtecture The Gree Master s a brad ew archtecture trasformed from Hadoop's Master, ad t ca apply to each odes that stall the Hadoop. The brad ew archtecture called Gree MapReduce System(GMS), ad t ca help users maage the ode the cluster to save the system cosumpto ad servce computg overhead. The Gree Master does ot chage the Map Fucto ad Reduce Fucto, t just chages the task allocato master accordg to server loadg ad server's Bechmark Score to acheve the goal about the eergy savg. It s ot accepted that the system performace reduces caused by someoe vrtual mache low effcecy, especally the Cloud Computg Network evromet. It s ot accepted that the system performace reduces caused by someoe vrtual mache low effcecy, especally the Cloud Computg Network evromet. I order to solve the above problems, Gree Master s desged to delete the poor servces ad allocate the job dstrbuto. Gree Master s dvdes to eght blocks, ad t cludes Iput Fle Idex, Queue, Server Iformato, Record, Load Balace Optmzato, Power Savg Algorthm ad Decso Algorthm. Gree Master has a strog adaptablty to may systems, for a stace, whe we eed a great amout of computg resources to calculate tasks, we ca use Gree Master to avod eergy wastes. For aother stace, whe the system equpmet has a strog ocoformace, ad the system ca use the Bechmark Score the Gree Master to arrage the tasks allocato accordg to the servces capablty. 3.2 Server Iformato The Server Iformato the Gree Master s to estmate the servces' capablty called Bechmark Score, ad t wll keep rug ad sed the results to Gree Master. I addto, wheever a ew server jo or qut the cluster, Bechmark Score wll

chage. The rage of the Bechmark Score s from zero to oe hudred, ad t s accordg to CPU computg performace, Memory read/wrte ad Dsk I/O rate to estmate the Bechmark Score. I other had, the hghest CPU respose tme, Memory read/wrte ad Dsk I/O s defed as 100 Bechmark Score. The defto of poorer vrtual maches' Bechmark Score are based o the hghest oe. X BechmarkScore 100% (3.1) Top where the Top s the hghest value of the vrtual mache, ad the x s the value of the vrtual mache lke CPU respose tme to be measured. Because of the CPU respose tme, Memory read/wrte ad Dsk I/O rate have to be cosdered the formula, so we tur formulas evoluto as follows: X Bechmark Score W, W { Measured Evet}, {1,2,..., m} (3.2) Top VM where the W s the evet of Bechmark Score. I our case, the of W s three, there are CPU respose tme, Memory read/wrte ad Dsk I/O rate respectvely. VM 3.3 Recorder Recorder s used for recordg server formato. Recorder refresh whe t receves ewer server formato. A ew recordg table s establshed for formato record whe there s ew ode jos to the cluster. Servers update ad refresh server formato recorder durg the workg tme. 3.4 Load Balace Optmzato Load Balace[3] Optmzato wll allocate the work loadg accordg to the formato collectg from the above-metoed blocks. The Bechmark Score s more hgher, ad the work loadg s more; the Bechmark s lower, ad the work loadg s less. The job s allocated to VMs through Load Balace Optmzato, ad the formula s followg: Task Dstrbuto Rato LocalVM 100% Score 1 (3.3) where Total Score s the sum of the VMs' Bechmark Score, ad the Local Score s the VM's Bechmark what you wat to estmate. I our expermet, we use sx VM the expermet evromet ad calculate the work loadg rato as followg:

3.5 Power Savg Algorthm I ths paper, Power Savg Algorthm (PSA)[4][5] wll check the utlzato of the server. I the frst state, we allocate the work loadg to VM accordg to the Bechmark Score, the the secod state, we wll determe the utlzato of the VM. I Fgure 4, we ca fd that the huge dfferece of the work loadg betwee Bechmark Score 100 ad Bechmark Score 5, but they use almost same eergy. Ths paper presets PSA to dscuss how to get the balace betwee effcecy ad eergy maagemet. a 1 Bj j 1 T B (3.4) E T N P, N {1,2,..., } (3.5) VM VM VM V E T 1 1 E T, 2 (3.6) where T s system computg tme, ad a s the system tme whch oe vrtual mache completed aloe, ad the B s the Bechmark Score of oe vrtual mache, ad the s the error tme. E s the eergy(j) of vrtual mache. P s the power(w) of vrtual mache. V s the rato of eergy cosumpto. 3.6 Decso Algorthm Decso Algorthm[6] GMS s to judge the result whch s from PSA reasoable or ot. The formula s as followg: (3.7) where α s the system cosumpto through PSA, ad s wthout PSA. If s greater tha α, the the system wll back to Load Balace Optmzato.

4 Smulato Result Fg. 3. Expermet Evromet Fgure 4 shows the hghest performace vrtual mache the expermet evromet of ths paper. 4.1 The Relatoshp betwee system computg tme ad system cosumpto 600 500 400 300 200 100 0 1 2 3 4 5 6 24000 19000 14000 9000 4000-1000 System Tme(s) System Cosumpto (J) Num of VM. Fg. 4. System Tme ad Cosumpto I Fgure 5, we ca fd that the cross pot betwee the system tme ad system cosumpto s betwee two VMs ad three VMs. I fact, the umber of VM of the best performace our expermet s three VMs. 4.2 Comparso betwee Orgal ad Gree Master

Pwer Savg(%) System Cosumpto(J) Tme of Computg System(S) 500 400 300 200 100 Orgal(s) Gree Master(s) 0 Test1 Test2 Test3 Fg. 5. System Computg Tme 50000 40000 30000 Orgal(J) 20000 10000 0 Test 1 Test 2 Test 3 Gree Master(J) Fg. 6. Power Cosumpto Savg 0.44 0.438 0.436 0.434 0.432 0.43 0.428 Test Test Test 1 2 3 Power Savg(%) 0.435 0.44 0.433 Fg. 7. Rato of Power Savg

I Fgures 6 ad 7, we take several dfferet szes of test fle our expermet evromet, we ca clearly fd the orgal system tme s less tha Gree Master, but system cosumpto s almost twce larger tha Gree Master. 5 Cocluso The dea of Gree Master optmzes system power cosumpto by lower the performace slghtly. I ths paper, we provde a approprate trade-off betwee power savg ad performace loses, ad mproves eergy coservato of the system. Refereces [1] Jeffrey Dea ad Sajay Ghemawat, MapReduce: Smpl_ed Data Processg o Large Clusters, OSDO 2004. [2] Xa Xe, Qgcha Che, Wezh Cao, Pgpeg Yua, Ha J, Bechmark Object for Vrtual Maches, 2010 Secod Iteratoal Workshop o Educato Techology ad Computer Scece. [3] Lars Kolb, Adreas Thor, Erhard Rahm, Load Balacg for MapReduce-based Etty Resoluto, 2012 IEEE 28th Iteratoal Coferece o Data Egeerg. [4] Tao Zhu1,2, Chegchu Shu1, Haya Yu1, Gree Schedulg: A Schedulg Polcy for Improvg the Eergy Effcecy of Far Scheduler, 2011 12th Iteratoal Coferece o Parallel ad Dstrbuted Computg, Applcatos ad Techologes. [5] Sadhya S V, Sajay H A*, Netravath S J, Sowmyashree M V, Yogeshwar R N, Fault Tolerat Master-Workers framework for MapReduce Applcatos, 2009 Iteratoal Coferece o Advaces Recet Techologes Commucato ad Computg. [6] Cogchog Lu ad Shuja Zhou, Local ad Global Optmzato of MapReduce Program Model, 2011 IEEE World Cogress o Servces. [7] NovQlu He,Zhahua L Xao Zhag, Study o Cloud Storage System based o Dstrbuted Storage Systems, Computatoal ad Iformato Sceces( ICCIS), 17-19 Dec 2010, pp. 1332-1335 [8] Kev D.Bowers,Ar Juels, ad Ala Oprea., HAIL: A HghAvalablty ad Itegrty Layer for Cloud Storage, Computer ad commucatos securty (CCS), Nov 2009, pp. 187-198 [9] Mgyue Luo,Gag Lu, Dstrbuted log formato processg wth Map-Reduce: A case study from raw data to fal models, Iformato Theory ad Iformato Securty(ICITIS), 17-19 Dec. 2010, pp.1143-1146