An Efficient Job Scheduling for MapReduce Clusters

Similar documents
Clustering based Two-Stage Text Classification Requiring Minimal Training Data

Approximation Algorithms for Data Distribution with Load Balancing of Web Servers

IWFMS: An Internal Workflow Management System/Optimizer for Hadoop

An Ensemble Classification Framework to Evolving Data Streams

A Replication-Based and Fault Tolerant Allocation Algorithm for Cloud Computing

A New Task Scheduling Algorithm Based on Improved Genetic Algorithm

Off-line and on-line scheduling on heterogeneous master-slave platforms

Predictive Control of a Smart Grid: A Distributed Optimization Algorithm with Centralized Performance Properties*

Increasing Supported VoIP Flows in WMNs through Link-Based Aggregation

The Development of Web Log Mining Based on Improve-K-Means Clustering Analysis

PAS: A Packet Accounting System to Limit the Effects of DoS & DDoS. Debish Fesehaye & Klara Naherstedt University of Illinois-Urbana Champaign

ANALYTICAL CHARACTERIZATION OF WLANS FOR QUALITY-OF-SERVICE WITH ACTIVE QUEUE MANAGEMENT

A Simple Congestion-Aware Algorithm for Load Balancing in Datacenter Networks

TCP/IP Interaction Based on Congestion Price: Stability and Optimality

Prediction of Success or Fail of Students on Different Educational Majors at the End of the High School with Artificial Neural Networks Methods

Multi-agent System for Custom Relationship Management with SVMs Tool

Dynamic Virtual Network Allocation for OpenFlow Based Cloud Resident Data Center

Open Access A Load Balancing Strategy with Bandwidth Constraint in Cloud Computing. Jing Deng 1,*, Ping Guo 2, Qi Li 3, Haizhu Chen 1

Module 2 LOSSLESS IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

Improved SVM in Cloud Computing Information Mining

Forecasting the Demand of Emergency Supplies: Based on the CBR Theory and BP Neural Network

An Interest-Oriented Network Evolution Mechanism for Online Communities

A Load-Balancing Algorithm for Cluster-based Multi-core Web Servers

A Resources Allocation Model for Multi-Project Management

"Research Note" APPLICATION OF CHARGE SIMULATION METHOD TO ELECTRIC FIELD CALCULATION IN THE POWER CABLES *

Optimization Model of Reliable Data Storage in Cloud Environment Using Genetic Algorithm

Methodology to Determine Relationships between Performance Factors in Hadoop Cloud Computing Applications

METHODOLOGY TO DETERMINE RELATIONSHIPS BETWEEN PERFORMANCE FACTORS IN HADOOP CLOUD COMPUTING APPLICATIONS

Predicting Advertiser Bidding Behaviors in Sponsored Search by Rationality Modeling

XAC08-6 Professional Project Management

Dynamic Resource Allocation for MapReduce with Partitioning Skew

Hacia un Modelo de Red Inmunológica Artificial Basado en Kernels. Towards a Kernel Based Model for Artificial Immune Networks

Cardiovascular Event Risk Assessment Fusion of Individual Risk Assessment Tools Applied to the Portuguese Population

A Secure Password-Authenticated Key Agreement Using Smart Cards

QoS-Aware Active Queue Management for Multimedia Services over the Internet

A heuristic task deployment approach for load balancing

Data Mining from the Information Systems: Performance Indicators at Masaryk University in Brno

A Programming Model for the Cloud Platform

DBA-VM: Dynamic Bandwidth Allocator for Virtual Machines

Mining Multiple Large Data Sources

DEFINING %COMPLETE IN MICROSOFT PROJECT

The Load Balancing of Database Allocation in the Cloud

Multi-sensor Data Fusion for Cyber Security Situation Awareness

Efficient Bandwidth Management in Broadband Wireless Access Systems Using CAC-based Dynamic Pricing

Sciences Shenyang, Shenyang, China.

An MILP model for planning of batch plants operating in a campaign-mode

LITERATURE REVIEW: VARIOUS PRIORITY BASED TASK SCHEDULING ALGORITHMS IN CLOUD COMPUTING

Forecasting the Direction and Strength of Stock Market Movement

Performance Analysis of Energy Consumption of Smartphone Running Mobile Hotspot Application

Data Broadcast on a Multi-System Heterogeneous Overlayed Wireless Network *

INVESTIGATION OF VEHICULAR USERS FAIRNESS IN CDMA-HDR NETWORKS

How To Make A Co-Ocaton Work For Free

Load Balancing By Max-Min Algorithm in Private Cloud Environment

A Performance Analysis of View Maintenance Techniques for Data Warehouses

A Cluster Based Replication Architecture for Load Balancing in Peer-to-Peer Content Distribution

Load Balancing Algorithm of Switched Dynamic Iteration

Expressive Negotiation over Donations to Charities

IMPACT ANALYSIS OF A CELLULAR PHONE

Minimal Coding Network With Combinatorial Structure For Instantaneous Recovery From Edge Failures

Cloud-based Social Application Deployment using Local Processing and Global Distribution

Damage detection in composite laminates using coin-tap method

Study of Cloud Services Recommendation Model Based on Chord Ring

THE APPLICATION OF DATA MINING TECHNIQUES AND MULTIPLE CLASSIFIERS TO MARKETING DECISION

AN APPOINTMENT ORDER OUTPATIENT SCHEDULING SYSTEM THAT IMPROVES OUTPATIENT EXPERIENCE

An Optimal Model for Priority based Service Scheduling Policy for Cloud Computing Environment

Assessing Student Learning Through Keyword Density Analysis of Online Class Messages

Semantic Link Analysis for Finding Answer Experts *

Adaptive Multi-Compositionality for Recursive Neural Models with Applications to Sentiment Analysis

VRT012 User s guide V0.1. Address: Žirmūnų g. 27, Vilnius LT-09105, Phone: (370-5) , Fax: (370-5) , info@teltonika.

Self-Adaptive SLA-Driven Capacity Management for Internet Services

On the Optimal Control of a Cascade of Hydro-Electric Power Stations

What is Candidate Sampling

2. SYSTEM MODEL. the SLA (unlike the only other related mechanism [15] we can compare it is never able to meet the SLA).

Network Security Situation Evaluation Method for Distributed Denial of Service

Politecnico di Torino. Porto Institutional Repository

Resource Sharing Models and Heuristic Load Balancing Methods for

Application of Multi-Agents for Fault Detection and Reconfiguration of Power Distribution Systems

Using Elasticity to Improve Inline Data Deduplication Storage Systems

DREAMS: Dynamic Resource Allocation for MapReduce with Data Skew

Design and Development of a Security Evaluation Platform Based on International Standards

The Greedy Method. Introduction. 0/1 Knapsack Problem

Factored Conditional Restricted Boltzmann Machines for Modeling Motion Style

How To Solve An Onlne Control Polcy On A Vrtualzed Data Center

Master s Thesis. Configuring robust virtual wireless sensor networks for Internet of Things inspired by brain functional networks

An Introduction to 3G Monte-Carlo simulations within ProMan

Feature selection for intrusion detection. Slobodan Petrović NISlab, Gjøvik University College

Transcription:

Internatona Journa of Future Generaton ommuncaton and Networkng, pp. 391-398 http://dx.do.org/10.14257/jfgcn.2015.8.2.32 An Effcent Job Schedung for MapReduce usters Jun Lu 1, Tanshu Wu 1, and Mng We Ln 1 and Shuyu hen 2 1 oege of omputer Scence, hongqng Unversty, hongqng, hna 2 oege of Software Engneerng, hongqng Unversty, hongqng, hna ujuncqcs@163.com, netmobab@cqu.edu.cn, wutanshu@cqu.edu.cn, nmwcs @163.com Abstract The job schedung for Map Reduce custers has receved sgnfcant attenton n recent years, because t pays an mportant roe on Map Reduce custers. Tradtona job schedung performs poory n assgnng a task to approprate nodes, and can not predct the resource utzaton of the unexecuted tasks. To address the probems, an effcent job schedung for Map Reduce custers s proposed n ths paper. The job schedung ntroduces dynamc prorty schedung and rea-tme predcton mode. Dynamc prorty schedung ntroduces the mnmum cost data ocaty agorthm wth a weght to dea wth dfferent sze jobs, and rea-tme predcton mode can predct the resource utzaton of unexecuted tasks by cacuatng the runnng tasks. The resource utzaton contans PU, memory, and network. Expermenta resuts prove that the proposed job schedung s abe to perform we n Map Reduce custers. Keywords: job schedung, mnmum cost data ocaty agorthm, and dynamc prorty schedung 1. Introducton Due to extremey easy to program, fast speed, scaabty, and faut-toerance acheved for a varety of appcatons, MapReduce [1-3] has been wdey regarded as a promsng aternatve to arge-scae data anayss such as graph processng, machne earnng, and data mnng. These appcatons whch are submtted to MapReduce custers are executed n the form of jobs, and each job contans a number of tasks. Every task w be assgned to a node, whch s generay caed save node, by task scheduer n custers. Task scheduer s one of the core technooges of MapReduce, t many contros the order of task executng and resource aocaton. In addton, t can drecty nfuence the performance of MapReduce custers and the executon tme of the dfferent prorty tasks. Therefore, an approprate task schedung s very mportant for MapReduce custers. MapReduce tsef provdes three man task schedung agorthms, whch are the Frst-n-frst-out (FIFO), the capacty schedung [4], and the far schedung [5]. Frst-n-frst-out agorthm s the bud-n scheduer n MapReduce custers. The advantages of FIFO are smpe and easy to mpement, because t deas wth the jobs n the way of frst n frst out, that s, the oder job can be dea frst, and the younger job can be handed ater. However, t does not take fuy nto account that there are dfferent szes of jobs ncudng sma and arge jobs n custers, and does not consder the support of mutpe users. To address the probems of FIFO, the far schedung s deveoped to dea wth sma and arge jobs as fary as possbe n custers. In order to acheve ths goa, job prortes, poo weghts, and deay schedung s ntroduced. The schedung of jobs s controed by job prortes wth a sutabe weght, and weght s dvded nto a certan eve, such as a weght of 1.0, a weght of 2.0, and 2x more weght. But the far schedung needs a ot of manuay confguraton, whch can greaty nfuence the performance of the jobs. Moreover, ISSN: 2233-7857 IJFGN opyrght c 2015 SERS

Internatona Journa of Future Generaton ommuncaton and Networkng the far schedung does not take the actua oad of the master node nto account, whch pays the roe on job schedung and dstrbutes job to a number of save nodes. apacty schedung supports mutpe job queues. However capacty schedung mts the resources that a job can be used. For above-mentoned reasons, t s qute cear that a vad job schedung s a comprehensve study on the performance, the support of mutpe users, and the effectve utzaton of resources. To reach ths desgn prncpe, an effcent job schedung, whch ams to mprove the performance of the custers, and to meet dfferent sze jobs, s proposed. The major contrbutons of ths paper can be summarzed as foows: (1).In order to mprove the performance of the MapReduce custers, rea-tme predcton mode w be used to predct the unexecuted tasks. Ths mode estmates the resource consumptons of the unexecuted tasks by cacuatng current runnng jobs. (2).In order to satsfy the dfferent sze of jobs, the mnmum cost data ocaty agorthm w be used to cacuate the degree of data ocaty. The agorthm can effcenty avod the probem that there are dfferent sze jobs n custers, and t can aso mprove the effcency of the job schedung. To evauate the effectveness of the proposed job schedung, the job schedung prototype has been mpemented and varous benchmarks have been conducted. The smuaton resuts show that the proposed job schedung sgnfcanty mproves the performance of the MapReduce custers. The rest of ths paper s organzed as foows. Secton 2 dscusses reated work. Secton 3 presents the effcent job schedung. Secton 4 presents the expermenta resuts. Secton 5 concudes the paper. 2. Reated Work In order to desgn sutabe job schedung, a number of job schedung agorthms have been proposed n recent years, ths secton brefy summarzes research work reated to job schedung n MapReduce custers. Zahara et a., [6] propose a job schedung, whch ntroduces the deay schedung agorthm to mprove the data ocaty. However, the agorthm does not take nto account of the dfferent sze of jobs n custer. Moreover, the agorthm performs we n sma jobs, and performs poor n arge jobs, because the deay schedung agorthm can ncur performance degradaton n arge jobs. In MapReduce, arge jobs are dvded nto fxed number tasks, and there are many watng tasks n queues due to the deay characterstcs n the deay schedung agorthm. So the executon tme of arge jobs w be ncrease. Jnshuang Yan et a. [10] aso propose a job schedung, whch has advantages n the tme cost durng the nta phase of a job, and the task assgnment because the push-mode repaces the pu-mode whch s a task assgnment mechansm. But, t can not perform we for arge jobs, because t s desgned for sma jobs. Seo et a. [7] propose a new job schedung, whch ntroduces the perfectng technooges to mprove the data ocaty and the performance of the custers. But a ot of memory consumpton and network throughout w be ncreased, because a number of unreated data to the task s read to memory. In addton, a mass of data s transmtted through network due to the storage characterstcs of HDFS [8-9]. Aprgo Bezerra e.t a. [11] propose a job schedung, whch seects tasks from a pendng jobs queues by anayzng the avaabe resources of the custers to mprove the performance of the custers. Athough anayzng the avaabe resources of the custers can appropratey submt a task to the custer, t can not predct the resource consumptons of the pendng jobs. Jsha S Manjay [12] aso propose a job schedung, caed Task Tracker aware schedung agorthm, whch ams to avod the task faure caused by overoadng n 392 opyrght c 2015 SERS

Internatona Journa of Future Generaton ommuncaton and Networkng custers, and n whch users must confgure the maxmum oad for every task by settng the threshod. However, the drawback of ths job schedung s the much confguraton to the users. So t s very dffcut to use. Obvousy, the above-mentoned job schedung agorthms have the common drawbacks, whch are not predctng the resource consumpton of the unexecuted or pendng tasks, and cacuatng the degree of the data ocaty, respectvey. 3. An Effcent Job Schedung Based on the anayss of the above job schedung n MapReduce custers, an optmzed job schedung s presented n ths secton. The goa of the job schedung s to assgn resources to jobs fary. Sma and arge jobs w be reasonaby assgned to each node by anayzng the practca stuaton of resource utzaton through the dynamc prorty schedung n MapReduce usters. Moreover, the job schedung can predct the resource utzaton of the jobs whch have not been performed by anayzng the performed jobs. 3.1. Dynamc Prorty Schedung The core of the job schedung s the dynamc prorty schedung, whch ntroduces the mnmum cost data ocaty agorthm wth a weght to dea wth dfferent sze jobs. The wegh of a job can be defned as W Locaty Pr orty (1) where prorty s the prorty of a job, and ocaty s the degree of data ocaty. From the formua, we can see that the weght of jobs contans the data ocaty and the prorty of jobs. The prorty of jobs s the job executon order defned by users. Data ocaty s that the correspondng data of a job w be stored n the nodes where jobs are executed. In ths paper, the date ocaty agorthm externs the host seecton agorthm n Hadoop [13], and f the correspondng data of a job s dvded nto dfferent nodes, the proposed job schedung can cacuate mnmum cost and assgn the job to approprate nodes by mnmum data ocaty agorthm. Assume that a job contans M bocks. The M bocks are stored n dfferent nodes, and the bock numbers of each node are denoted as N 1, N 2, and N n respectvey. The dstances of each node to the task schedung node (Jobtracker) are denoted as D 1, D 2, and D n respectvey. T represents the tme cost that a bock s transferred to the node wth the maxmum number of bocks. The reason of ths seecton s that the frst executed task s assgned to the node wth the maxmum number of bocks. T s reevant wth the sze of bocks, the number of bocks, and the actua network transmsson speed. The sze of bocks s denoted as Bock sze (64MB by defaut), the number of bocks s denoted as N, and the actua network transmsson speed s denoted as speed. So T can be ndcated as T (2) N speed D Bock In ths paper, assume that the data bocks of the executed job are dstrbuted nto n nodes, whose ocaty are denoted as Locaty 1, Locaty 2,, and ocaty respectvey.. So the ocaty s denoted as Locaty sze 1 ( B ) T f N Where B f s the data bocks whch s transmtted through the network. Accordng to formua (2) and (3), ocaty s equa to: (3) opyrght c 2015 SERS 393

Internatona Journa of Future Generaton ommuncaton and Networkng Locaty 1 ( B ) f N N speed D Bock The mnmum cost data ocaty agorthm seects the smaest P nodes n {Locaty 1, Locaty 2, and ocaty }. User can predefne the vaue of P. The P nodes are sorted from smaest to argest accordng to ocaty. In MapReduce custers, a job s dvded nto a fxed number of tasks, and each task s assgned to a node. Fgure 1 ustrates an exampe of the mnmum cost data ocaty agorthm. In ths exampe, a job s dvded nto 2 tasks. There are 5 data bocks n node 1, two data bocks n node 2, and 3 bocks n node 3. The ocaty st s accessed n the order of ocaty 1, ocaty 3, and ocaty 2. Obvousy, task 1 s assgned nto node 1, and task 2 s assgned to node 3. The ocaty of node 3 s smaer than node 2 because the number of bocks n node 3 s arger than node 2. From the Fgure 1, we can see that there s no task n the node 2. Assume that the dstance between node 1 to node 2 s smaer than the dstance between node 2 to node 3. Therefore the data bocks n node 2 are processed by the task 1 n node 1. sze (4) T a s k 1 T a s k 2... T a s k n L o c a ty 1 L o c a ty 3 L o c a ty 2 L ocaty st order by ocaty from sm aest to argest... N o d e 1 N o d e 2 N o d e 3 N o d e n T h e c u rre n ty e x e c u tn g ta s k T h e ta s k w h c h h a s n o t b e e n e x e c u te d Fgure 1. The Exampe of the Mnmum ost Data Locaty Agorthm 3.2. The Rea-Tme Predcton Mode of Jobs In rea-tme predcton mode, the resource utzaton of the unexecuted tasks can be concuded through the executed tasks. These resources ncude PU, memory, and network resources. Assume that a job s dvded nto ten tasks, whch are denoted as Task 1, Task 2,, and Task n, whose engths are denoted as L 1, L 2,, and, L n respectvey, sx of them are runnng, and the other four are watng. The resource consumpton of a task Task wth the ength L n a node s derved as c (5) cpu memory where cpu stands for the PU consumpton, memroy stands for memory consumpton, and network stands for the consumpton of the network transmsson. In addton, memory contans the number of memory bytes consumed by Task t own, whch s denoted as t., the number of memory bytes consumed by storng oca data, whch s denoted as, and the number of memory bytes consumed by storng network data, whch s denoted as n. So memroy s derved as network 394 opyrght c 2015 SERS

Internatona Journa of Future Generaton ommuncaton and Networkng memory t n og( T TotaTme where L s the sze of temporary data wrtten to the oca dsk, because the task wrtes the memory data to the oca dsk when memory usage reaches a certan threshod, and a s reguator. When there s no data transmtted by network, the vaue of n s zero. Accordng to formua (5) and (6), c s equa to c cpu t n og( T TotaTme where T TotaTme s the tota runnng tme of the job. When a runnng task T r wth the ength L s competed, the resource consumpton c of the task T r s cacuated. The reatme predcton mode w seect an unexecuted task whose sze s equa to L from the watng st, and assgn the task to the node where the task T r ocated. If the sze of task cannot be found that exacty matches the sze, the cosest match s used. L L n ) n ) network (6) (7) T he currenty runnng task e n g th 9 8 23 45 e n g th 10 9 34 45 34 23 T a s k 7 T a s k 8 T a s k 9 T a s k 10 T a s k 1 T a s k 2 T a s k 3 T a s k 4 T a s k 5 T a s k 6 T h e ta s k s w h c h a re w a tn g N o d e 1 N o d e 2 N o d e 3 N o d e 6 Fgure 2. The Exampe of the Rea-Tme Predcton Mode Fgure 2 ustrates an exampe of the rea-tme predcton mode, support Task 1, Task 2, Task 3, Task 4, Task 5, and Task 6 are runnng, and Task 7, Task 8, Task 9, and Task 10 are watng. As shown n Fgure 2, task 2 s competed. The sze of task 2 s 9. In watng st, the sze of task 7 s 9. So assgnng task 7 to the Node 3 s the most effcent assgnment method, because the resource ncudng PU, memory, and network on Node 3 can meet the requrements of the task 7. 4. Expermenta Evauatons 4.1. Expermenta Envronment In order to evauate the effectveness of the proposed job schedung, the schedung s compared wth exstng job schedung agorthms, whch are the Frst-n-frst-out (FIFO) agorthm, and the far scheduer. And to evauate the performance of the proposed job schedung, an experment envronment of a MapReduce custer wth hadoop 1.0.0 s estabshed. The experment custer contans one master node and 9 save nodes. These opyrght c 2015 SERS 395

Executon Tme (sec). Internatona Journa of Future Generaton ommuncaton and Networkng nodes are connected wth a 100 Mb/s network. The master node s confgured wth 4-core 3.20 GHz Inte 7-960 processors, 16GB of memory and one 1TB 5400 RPM SATA dsk. Each save node s equpped wth 4-core 3.10 GHz Inte 5-2400 processors, 16GB of memory and three 1TB 5400 RPM STAT dsks. They a run enteros 6.4 wth kerne 2.6.32-358.e6.x86_64 operatng system. Each dsk s formatted wth the ext4 fe system. The master node acts as JobTracker, SecondaryNameNode, and NameNode. Each save node acts as DataNode, and TaskTracker. Wordount [14] and TestSort [15] [16] benchmarks are performed n the expermenta envronment. The reason of seecton the two benchmarks s that Wordount and TestSort program s often used as a basene benchmark for MapReduce. The sze of test data s cassfed nto four types, whch are 500MB, 1GB, 2GB, and 5GB. In order to make the test data stored nto each save node average, the bock sze of HDFS fe system s set to the defaut vaue of 64MB, and the repcaton number of a bock s set to the vaue of 3. 4.2. Experment Resut Fgure 3 and Fgure 4 show the experment resuts of the tree job schedung agorthms n term of executon tme. As shown n Fgure 3 and Fgure 4, the proposed job schedung performs better than the FIFO and far scheduer n term of executon tme, because each task s assgned to the reasonabe save node through the dynamc prorty schedung and rea-tme predcton mode. When the sze of test data s 5GB, the advantage of the proposed schedung s more apparent, because there are mount of watng task n custer. In ths case, the proposed job schedung can predct the resource usage of the watng tasks, and assgn a watng job to the most sutabe save node. Ths assgnment scheduer can use the custer resources effectvey. Moreover, the weght of a job can obvousy reduce the data transmsson of network, because a nove data ocaty s ntroduced. HIFO and far scheduer spend a ot of tme on the data transmsson of network, whch ncreases the tota runnng tme of the job. 16000 14000 12000 10000 8000 6000 4000 2000 0 500MB 1GB 2GB 5GB Data Sze FIFO far scheduer the proposed schedung Fgure 3. Word ount Job Executon Tme 396 opyrght c 2015 SERS

Executon Tme (sec) Internatona Journa of Future Generaton ommuncaton and Networkng 16000 14000 12000 10000 8000 6000 4000 2000 0 500MB 1GB 2GB 5GB Data Sze FIFO far scheduer the proposed schedung Fgure 4. Test Sort Job Executon Tme From the Fgure 3 and Fgure 4, we can see that the executon tme of the TestSort s arger than Wordount. The reason s that TestSort w carry shuffe data from one save node to the other. The process w generate heavy dsk I/O and network throughput. In addton, there s a mount of shuffe data n shuffe stage. In ths case, a major botteneck s network I/O bottenecks. For Wordount, there are ony sma shuffe data n shuffe state. 5. oncuson Ths paper presents a nove job schedung for MapReduce custers. The objectves of the proposed schedung are reducng the executon tme of jobs, and takng fu advantage of the resource of each node. The proposed job schedung s advantageous n executon tme and the resource utzaton because t can cacuate mnmum cost and assgn the job to approprate nodes by mnmum cost data ocaty agorthm. Moreover, the resource utzaton of the unexecuted tasks can be concuded through the executed tasks. A seres of experments are conducted and encouragng resuts are obtaned. Acknowedgements We are gratefu to the edtors and anonymous revewers for ther vauabe comments on ths paper. The work of ths paper s supported by Natona Natura Scence Foundaton of hna (Grant No. 61272399) and Research Fund for the Doctora Program of Hgher Educaton of hna (Grant No. 20110191110038). References [1] J. Dean and S. Ghemawat, "Smpfyng MapReduce data processng", Proceedngs of the 4th IEEE Internatona onference on Utty and oud omputng, (2011) December, pp. 5-8, pp. 366-370, Mebourne, Austraa. [2] X. Kaq and H. Yuxong, "Power-effcent resource aocaton n MapReduce custers", Proceedngs of the IFIP/IEEE Internatona Symposum on Integrated Network Management, (2013) May 27-31, pp. 603-608, Ghent, Begum. [3] Apache Software Foundaton, Offca Apache Hadoop Webste. URL http://hadoop.apache.org/ Accessed date, (2012) Juy 1. [4] apacty Scheduer, Tech. rep., Retreved, http://hadoop.apache.org/common/ docs/r0.20.2/capacty_scheduer.htm, (2012) February. [5] Far Scheduer, Tech. rep., Retreved: (2012) February. http://hadoop.apache.org/common/docs/r0.20.2/far_scheduer.htm [6] M. Zahara, D. Borthakur, J. S. Sarma, K. Emeeegy, S. Shenker and I. Stoca, "Deay schedung: a smpe technque for achevng farness," Proceedngs of 16th Euro Sys onference, (2010) March 1-5, pp. 265-278, Pars, France. [7] S. Seo, I. Jang, K. Woo, I. Km, J. S. Km and S. Maeng, "HPMR: Perfectng and pre-shuffng n shared MapReduce computaton envronment," Proceedngs of the IEEE Internatona onference on uster omputng and Workshops, (2009) August 31-September 4, New Oreans, Unted states. opyrght c 2015 SERS 397

Internatona Journa of Future Generaton ommuncaton and Networkng [8] HDFS homepage. http://hadoop.apache.org/hdfs/ [9] S. Ghemawat, H. Goboff, and S. Leung, The Googe Fe System, Proceedngs of the 19th AM Symposum on Operatng Systems Prncpes, vo. 37, no. 5, (2003) October 19-22, pp. 29-43. Lake George, Unted States. [10] J. Yan, X. Yang, R. Gu,. Yuan, and Y. Huang, "Performance Optmzaton for Short MapReduce Job Executon n Hadoop", Proceedngs of the 2nd Internatona onference on oud and Green omputng and 2nd Internatona onference on Soca omputng and Its Appcatons, Xangtan, hna, (2012) November 1-3, pp. 688-694. [11] A. Bezerra, P. Hernández, and A. Espnosa, "Job Schedung for Optmzng Data Locaty n Hadoop usters", Proceedngs of the 20th European MPI Users' Group Meetng, Madrd, Span, (2013) September 15-18, pp. 271-276. [12] J. S. Manjay, V. S. hoora, "Task Tracker Aware Schedung for Hadoop MapReduce", Proceedngs of the 3th Internatona onference on Advances n omputng and ommuncatons, (2013) August 29-31, pp. 278-281, Koch, Inda. [13] Hadoop homepage. http://hadoop.apache.org/. [14] Word ount Program. Avaabe n Hadoop source dstrbuton: src/exampes/org/apache/hadoop/ exampes/wordount. [15] Hadoop TeraSort program. Avaabe n Hadoop source dstrbuton snce 0.19 verson: src/exampes/org/apache/hadoop/exampes/terasort. [16] TeraSort. http://sortbenchmark.org/. Authors Jun Lu, receved hs B.S. degree n Southwest Unversty, P. R. hna, at 2001, and M.S. degree n hongqng Unversty, P. R. hna, at 2009. urrenty he s a Ph.D. canddate n oege of omputer Scence, at hongqng Unversty. Hs current nterests ncude bg data anaytcs, fash memory, nformaton securty, and Lnux Kerne. ShuYu hen, He receved hs Ph.D. degree n hongqng Unversty, P. R. hna, at 2001. urrenty, he s a professor of oege of Software Engneerng at hongqng Unversty. Hs research nterests ncude embedded Lnux system, dstrbuted systems, coud computng, etc. He has pubshed over 120 journa and conference papers n reated research areas durng recent years. Tanshu Wu, He receved hs B.S. degree n hongqng Unversty of Posts and Teecommuncatons, P. R. hna, at 2011. He s currenty a Ph.D. canddate n oege of omputer Scence, at hongqng Unversty. Hs current nterests ncude coud computng, arge-scae data mnng and faut detecton. MngWe Ln, receved hs B.S. degree n hongqng Unversty, P. R. hna, at 2009. He s currenty a Ph.D. canddate n hongqng Unversty. He s nvted as the revewer by Journa of Systems and Software, as we as omputers and Eectrca Engneerng. Hs current nterests ncude arge-scae data mnng, fash memory, Lnux Kerne, nformaton securty and wreess sensor network. 398 opyrght c 2015 SERS