A REAL TIME MEMORY SLOT UTILIZATION DESIGN FOR MAPREDUCE MEMORY CLUSTERS

Save this PDF as:
 WORD  PNG  TXT  JPG

Size: px
Start display at page:

Download "A REAL TIME MEMORY SLOT UTILIZATION DESIGN FOR MAPREDUCE MEMORY CLUSTERS"

Transcription

1 A REAL TIME MEMORY SLOT UTILIZATION DESIGN FOR MAPREDUCE MEMORY CLUSTERS Suma R 1, Vinay T R 2, Byre Gowda B K 3 1 Post graduate Student, CSE, SVCE, Bangalore 2 Assistant Professor, CSE, SVCE, Bangalore 3 Assistant professor, CSE, SIR.MVIT, Bangalore ABSTRACT For large data processing in the cloud map reduce is a process we can split the data into multiple parts or make it into the slot and then process and mapping process will happen. The slot based map reduce is not too effective it gives the poor performance because of the unoptimized resource allocation and they have the various challenges. The map reduce job task execution have the two unique feature. The map slot allocation only allocate the map task and reduce task only be allocated to reduce task and the map task process before the reduce task. The data locality maximization for the efficiency and utilization is required to improve the quality of the system proposed the various challenge to address this problem. The DynamicMR is a dynamic slot allocation framework to improve the performance of map reduce[1]. The DynamicMR focuses on Hadoop fair scheduler (HFS). The Dynamic scheduler consist of three optimization techniques Dynamic Hadoop slot Allocation (DHSA), Speculative Execution performance Balancing(SEPB) and Slot Prescheduling. 1. INTRODUCTION Big data is a collection of both structured and unstructured data that is too large, fast and distinct to be managed by traditional database management tools or traditional data processing applications. Hadoop is an open-source software framework from Apache that supports scalable distributed applications. Hadoop supports running applications on large clusters of commodity hardware and provide fast and reliable analysis of both structured and unstructured data. Hadoop uses simple programming model. Hadoop can scale from single servers to thousands of machines, each offering local computation and storage. Despite many studies in optimizing MapReduce/Hadoop, there are many key challenges for the utilization and performance improvement of a Hadoop cluster. Firstly, the resources (e.g., CPU cores) are abstracted as map and reduce slots. A MapReduce job execution has two unique features: 1. The slot allocation constraint assumption that map slots are allocated to map tasks and reduce slots are allocated to reduce tasks, and 2. The map task are executed first and then the reduce task. We have 2 observation here 1. For different slot configuration there are different system utilization and performance for a mapreduce workload. 2. Idle reduce slots which affects the performance and system utilization. Secondly due to straggler problem of map or reduce task,delay of the whole job occurs. Thirdly,by maximizing the data locality performance and slot utilization efficiency improvement occurs in mapreduce workload. DynamicMR have 3 techniques here,dynamic Hadoop Slot Allocation(DHSA),Speculative Execution Performance Balancing(SEPB),Slot Prescheduling(SP). DHSA 1. Slots can be used for map task or reduce task,in map slots if there are insufficient slots for map task it can borrow unused slots from reduce slots. Similarly reduce slots can also borrow slots from map slots if the reduce task is greater than reduce slots. 2. Map slots are used for map task and reduce slots are used for reduce task.

2 SEPB It is used to find out the slow running task.so we propose a technique called speculative execution performance balancing for this task which is speculative.by this it can balance performance tradeoff b/w a single job and a batch of job execution time. Slot Prescheduling Approach for improving data locality in mapreduce. So, it is at the cost of fairness. DynamicMR improve performance and utilization of mapreduce workload with 46% -115% for single job and 49% - 112% for multiple job. 1. Slot Utilization Optimization 2. Utilization Efficiency Optimization PI-DHSA PD-DHSA Reduce Task Map Task Idle Slot Dynamic Hadoop Slot Allocation (DHSA) Speculative Execution Performance Balancing (SEPB) Slot PreScheduling Fig- 1: O verview of DynamicMR Framework Popularity of mapreduce in industry, bioinformatics, machine learning. Implementation of mapreduce is hadoop[6]. Multiple task run in mapreduce in each node. Each node host a configure number of map and reduce slots. When task is assigned to slots it get occupied,and when task completes slot gets released. Resource underutilization overcome by using resource stealing. Speculative execution in mapreduce to support fault tolerance. Progress of all scheduled task maintained by master node. When master node finds a slow running task,a speculative task is launched to process the task fast. To process huge data framework and powerful hardware is required. Google proposed mapreduce for parallel data. Dynamic and aggressive approach is mapreduce. Sometimes fairness and data locality conflict eachother,when fairness is strict data locality degradation occurs and purely data locality result in unfairness of resource usage. Mapreduce is a programming model for large scale data processing[2]. Mapreduce which process 20 petabytes of data per day. Open source implementation of mapreduce is hadoop.example of hadoop is facebook,google etc. In mapreduce it uses a distributed storage layer refered to as Hadoop distributed file system. A job is submitted by user comprising of map function and reduce function which are transformed into map and reduce task respectively. Data is split into equal size by HDFS and distributes data into cluster nodes, mapping is performed in HDFS. Intermediate output are partitioned into one or many reduce task. Locality- Aware Reduce task Scheduler (LARTS) partition of sizes to have data locality. 2. EXISTING SYSTEM Scheduling and resource allocation optimization There are scheduling and allocation of resource for mapreduce jobs. In mapreduce case for 1 job we have multiple task. In the same time all job arrive and minimize the job completion time is objective. To achieve this we develop a computation model to solve large scale data problem and undergo graph analysis. Mapreduce modeled into 2 stage hybrid flow shop. Job submission result in performance improvement of system and utilization. Map and reduce task execution time should be known before, which is not possible in real world application. DHSA can be used for any mapreduce workload. In optimal hadoop configuration eg:in Map/reduce slot configuration,it contain room for improving performance of mapreduce workload. Guo et al propose a method called resource stealing[3] to steal resources which are reserved for idle slots here adopting multi-threading technique for task which is running on multiple CPU cores. Polo et al propose a method called resource aware scheduling technique for map reduce workload, which improve resource utilization.in DHSA we can improve system utilization by allocating unused map and reduce slots. New version of hadoop is YARN.Inefficiency problem of hadoop is overcome by using YARN.Resources are managed here consisting of resources like memory,band width.however for multiple jobs DynamicMR is better than YARN bcz here is YARN there is no concept of slot.

3 Speculative Execution optimization: Use to deal with straggler problem using LATE.Longest Approximate Time to End is algorithm for Speculative Execution which focuses on heterogeneous environment and speculative task are capped. By Guo et al LATE performance is improved by proposing a Benefit Aware Speculative Execution(BASE). Benefit Speculative Task, so we propose SEPB to balance tadeoff b/w single job and batch of job. Data Locality Optimization For efficiency improvement and performance of the cluster utilization by data locality Optimization[4]. In mapreduce we have map side and reduce side. The data locality optimization for mapside is moving the maptask close to the input data blocks. Mapreduce jobs are classified into map-input heavy,map and reduce input heavy and reduce-input heavy. The reduce-side data locality place reduce task to the machines that generate intermediate data by maptask. Mapside data locality belong to slot prescheduling. Extra idle slots is used to maximize data locality and faireness. Delay scheduler and slot prescheduling is used to achieve faireness and data locality. 2 types of slot optimizers SEPB and Slot prescheduling for improvement of DHSA. Mapreduce optimization on cloud computing Fine grained optimization for hadoop is DynamicMR. By combine existing system and DynamicMR together develop framework and budget in cloud computing. 3. PROPOSED SYSTEM Mapreduce performance can be improved from 2 perspective. Firstly slots are classified into busy slot and idle slot. One approach here is to increasing slot utilization by maximizing busy slot and minimizing the idle slots. Second is every busy slot have not been efficiently utilized. Thus our approaches is to improve the utilization of busy slot. DHSA which is used to increase slot utilization and maintaining faireness [4]. SEPB improve slow running task. Slot prescheduling [10]improves performance by data locality and faireness. DynamicMR have the following step-by-step processes: 1 When there is a idle slot, DynamicMR will improve the slot utilization with DHSA. DynamicMR will decide whether to allocate it or not Eg:Faireness. 2 Allocation is true, DynamicMR will improve the efficiency of slot by SEPB. Speculative Execution will achieve performance tradeoff b/w a single job and batch of job. 3 For pending maptask allocate idle slots. DynamicMR will improve efficiency of slot utilization with slot prescheduling. 3.1 Dynamic Hadoop Slot Utilization: Mapreduce current design suffers from under utilization of slots bcz number of map and reduce task varies over time. Where the number of map/reduce task is greater than the map/reduce slots. Reduce task which is overloaded we can use unused map slots by that mapreduce performance is improved. All workload will lie in the map side. So we use idle reduce slots for map task. Map and reduce task can run on either map slots or reduce slots. 1 In HFS faireness is important: When all pools are allocated with equal amount of resources it is a fair. 2 Map slots and reduce slot resource requirement is different[9]. Memory and n/w bandwidth are resources of reduce task. DHSA contain 2 alternatives namely PD-DHSA and PI-DHSA. Pool-Independent DHSA:PI-DHSA process consist of 2 parts: Fig-2: Pool-Independent DHSA

4 1 Intra-phase dynamic slot allocation: Pool is divided into 2 sub pools i.e. map-phase pool and reduce-phase pool. The pool which is overloaded and have slot demand can borrow unused slots from other pool of same phase. Eg: Map phase pool 2 can borrow map slots from map phase pool 1 and pool 3. 2 Inter-phase dynamic slot allocation: When reduce phase contain unused reduce slot and we have insufficient map slots for map task, then it will borrow idle slots from reduce slots. Nm-total number of map task. Nr-total number of reduce task. Sm-total number of map slots. Sr-total number of reduce slots. Case 1: When Nm Sm and Nr Sr map slots run on reduce task and reduce slots run on reduce task i.e slots borrowing does not takes place. Case 2: When Nm > Sm and Nr < Sr reduce slots for reduce task and use idle reduce slots for running map task. Case 3: When Nm < Sm and Nr > Sr, for running reduce task we use unused mapslots. Case 4: When Nm > Sm and Nr > Sr system in busy state, map and reduce slots have no movement. We have 2 variables PercentageOfBorrowed MapSlots and PercentageOfBorrowed ReduceSlots. PD-DHSA: Fig-3: Pool Dependent DHSA 2 pools map-phase pool and reduce-phase pool is selfish. Until the map-phase and reduce-phase satisfy its own shared map and reduce slots before going to other pools. 2 processes: 1 Intra-pool dynamic slot allocation: In this pool we have 4 relationship Case a: Mapslot Demand < Mapshare and Reduceslot Demand > reduce share,borrow unused map slots from reduce phase pool 1 st for its overloaded reduce task. Case b: MapslotsDemand > Mapshare and ReduceSlotsDemand < reduce share, reduce phase contain unused slots to its map task. Case c: MapSlotsDemand Mapshare and reduceslotsdemand reduceshare,mapslots and reduce slots do not borrow any slots. It can give slots to other pools. Case d: MapSlotsDemand > mapshare and reduceslotsdemand > reduceshare.here mapslots and reduceslots are insufficient. Map slots and reduce slots borrow slots from other pools. 2 Inter-pool dynamic slot allocation: MapslotsDemand + ReduceslotsDemand Mapshare + reduceshare in this case no need of borrowing slots from other pools. MapSlotsDemand + ReduceSlotsDemand > mapshare + reduceshare in this case even after Intra-pool dynamic slot allocation slots are not enough. So it will borrow unused slots from other pools. Tasktracker have 4 possible slot allocation.

5 Fig-4: Slot Allocation For PD-DHSA Case 1: Tasktracker if have idle map slots it undergo map tasks allocation and it contain pending task for pool. Case 2: If case 1 fails then Tasktracker if have idle reduce slots it undergo reduce task allocation and it contain pending task for pool. Case 3: If case 1 and case 2 fails then in case 3 for map task we try reduce slots. Case 4: For reduce task we allocate map slots. 3.2 Speculative execution performance balance: Job execution time for mapreduce is very sensitive to slow running task. Stragglers due to faulty hardware and software misconfiguration. Stragglers are 2 types Hard straggler and soft straggler. Hard straggler :A task due to endless waiting for certain resources goes to deadlock status. we should kill the task, because it will not stop. Soft straggler :A task take much longer time than the common task, but the task get successfully complete. Back up task means killing task of Hard straggler and running other task. Straggler problem detected by Late algorithm. Speculation excecution will reduce a job excecution time. Fig-5: TotalnumofPending maptask and totalnumofpending reducetask. In SEPB 1 st the task which is failed given higher priority. 2 nd the task which are pending are considered. LATE which handle straggled task,it will call backup task and allocate a slot. Consider example with 6 jobs. Speculative cap for LATE is 4 and the maxnum of jobs checked for pending taskis 4. Idle slots are 4. SEPB will allocate all 4 idle slot to pending task bcz pending task for j1,j2,j3,j4,j5,j6 are 0,0,10,10,15,20 respectively. On top of LATE, SEPB works and SEPB is enhancement of LATE. 3.3 Slot perscheduling: Which improve data locality[5] and without having negative impact on the faireness of mapreduce jobs. Defn 1: The available idle map slots that can be allocated to the tasktracker. Defn 2: The extra idle map slots are subtracting used map slots and allow available idle map slots. Technique Faireness Slot Utilization Performance

6 DHSA SEPB + + DS _ %(+) + SPS + %(+) + TABLE 1: +, _, % Denotes Benefit,Cost,efficiency respectively. 4. CONCLUSION Improving performance of Mapreduce workload by DynamicMR framework and maintaining faireness.three techniques here are DHSA, SEPB, Slot prescheduling all focus on utilization of slot for mapreduce cluster. Utilization of slot can be maximized by DHSA. Inefficiency of slot is identified by SEPB. Slot prescheduling improves slot utilization efficiency. Combining these techniques improve Hadoop System. REFERENCES [1] Q. Chen, C. Liu, Z. Xiao, Improving MapReduce Performance Using Smart Speculative Execution Strategy. IEEE Transactions on Computer, [2] J. Dean and S. Ghemawat. MapReduce: Simplified Data Processing on Large Clusters, In OSDI 04, pp , [3] Z.H. Guo, G. Fox, M. Zhou, Y. Ruan.Improving Resource Utilization in MapReduce. In IEEE Cluster 12. pp , [4] Z. H. Guo, G. Fox, and M. Zhou.Investigation of data locality and fairness in MapReduce. In MapReduce 12, pp, 25-32, [5] Z. H. Guo, G. Fox, and M. Zhou. Investigation of Data Locality in MapReduce. In IEEE/ACM CCGrid 12, pp, , [6] Hadoop. [7] M. Hammoud and M. F. Sakr. Locality-Aware Reduce Task Scheduling for MapReduce. In IEEE CLOUDCOM 11. pp , [8] M. Hammoud, M. S. Rehman, M. F. Sakr. Center-of-Gravity Reduce Task Scheduling to Lower MapReduce Network Traffic. In IEEE CLOUD 12, pp , [9] B. Palanisamy, A. Singh, L. Liu and B. Jain, Purlieus: Localityaware Resource Allocation for MapReduce in a Cloud, In SC 11, pp. 1-11, [10] J. Polo, C. Castillo, D. Carrera, et al. Resource-aware Adaptive Scheduling for MapReduce Clusters. In Middleware 11, pp ,

DynamicMR: A Dynamic Slot Allocation Optimization Framework for MapReduce Clusters

DynamicMR: A Dynamic Slot Allocation Optimization Framework for MapReduce Clusters IEEE TRANSACTIONS ON CLOUD COMPUTING 1 DynamicMR: A Dynamic Slot Allocation Optimization Framework for MapReduce Clusters Shanjiang Tang, Bu-Sung Lee, Bingsheng He Abstract MapReduce is a popular computing

More information

DynMR: A Dynamic Slot Allocation Framework for MapReduce Clusters in Big Data Management using DHSA and SEPB

DynMR: A Dynamic Slot Allocation Framework for MapReduce Clusters in Big Data Management using DHSA and SEPB RESEARCH ARTICLE DynMR: A Dynamic Slot Allocation Framework for MapReduce Clusters in Big Data Management using DHSA and SEPB Anil Sagar T 1, Ramakrishna V Moni 2 1 (Mtech, Dept of CSE, VTU, SaIT, Bangalore

More information

Survey on Scheduling Algorithm in MapReduce Framework

Survey on Scheduling Algorithm in MapReduce Framework Survey on Scheduling Algorithm in MapReduce Framework Pravin P. Nimbalkar 1, Devendra P.Gadekar 2 1,2 Department of Computer Engineering, JSPM s Imperial College of Engineering and Research, Pune, India

More information

A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM

A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM Sneha D.Borkar 1, Prof.Chaitali S.Surtakar 2 Student of B.E., Information Technology, J.D.I.E.T, sborkar95@gmail.com Assistant Professor, Information

More information

Scheduling Algorithms in MapReduce Distributed Mind

Scheduling Algorithms in MapReduce Distributed Mind Scheduling Algorithms in MapReduce Distributed Mind Karthik Kotian, Jason A Smith, Ye Zhang Schedule Overview of topic (review) Hypothesis Research paper 1 Research paper 2 Research paper 3 Project software

More information

A Hybrid Scheduling Approach for Scalable Heterogeneous Hadoop Systems

A Hybrid Scheduling Approach for Scalable Heterogeneous Hadoop Systems A Hybrid Scheduling Approach for Scalable Heterogeneous Hadoop Systems Aysan Rasooli Department of Computing and Software McMaster University Hamilton, Canada Email: rasooa@mcmaster.ca Douglas G. Down

More information

Hadoop Scheduler w i t h Deadline Constraint

Hadoop Scheduler w i t h Deadline Constraint Hadoop Scheduler w i t h Deadline Constraint Geetha J 1, N UdayBhaskar 2, P ChennaReddy 3,Neha Sniha 4 1,4 Department of Computer Science and Engineering, M S Ramaiah Institute of Technology, Bangalore,

More information

Improving MapReduce Performance in Heterogeneous Environments

Improving MapReduce Performance in Heterogeneous Environments UC Berkeley Improving MapReduce Performance in Heterogeneous Environments Matei Zaharia, Andy Konwinski, Anthony Joseph, Randy Katz, Ion Stoica University of California at Berkeley Motivation 1. MapReduce

More information

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY A PATH FOR HORIZING YOUR INNOVATIVE WORK A REVIEW ON HIGH PERFORMANCE DATA STORAGE ARCHITECTURE OF BIGDATA USING HDFS MS.

More information

MAPREDUCE [1] is proposed by Google in 2004 and

MAPREDUCE [1] is proposed by Google in 2004 and IEEE TRANSACTIONS ON COMPUTERS 1 Improving MapReduce Performance Using Smart Speculative Execution Strategy Qi Chen, Cheng Liu, and Zhen Xiao, Senior Member, IEEE Abstract MapReduce is a widely used parallel

More information

Big Data and Apache Hadoop s MapReduce

Big Data and Apache Hadoop s MapReduce Big Data and Apache Hadoop s MapReduce Michael Hahsler Computer Science and Engineering Southern Methodist University January 23, 2012 Michael Hahsler (SMU/CSE) Hadoop/MapReduce January 23, 2012 1 / 23

More information

Improving Job Scheduling in Hadoop MapReduce

Improving Job Scheduling in Hadoop MapReduce Improving Job Scheduling in Hadoop MapReduce Himangi G. Patel, Richard Sonaliya Computer Engineering, Silver Oak College of Engineering and Technology, Ahmedabad, Gujarat, India. Abstract Hadoop is a framework

More information

Task Scheduling in Hadoop

Task Scheduling in Hadoop Task Scheduling in Hadoop Sagar Mamdapure Munira Ginwala Neha Papat SAE,Kondhwa SAE,Kondhwa SAE,Kondhwa Abstract Hadoop is widely used for storing large datasets and processing them efficiently under distributed

More information

Scheduling Data Intensive Workloads through Virtualization on MapReduce based Clouds

Scheduling Data Intensive Workloads through Virtualization on MapReduce based Clouds ABSTRACT Scheduling Data Intensive Workloads through Virtualization on MapReduce based Clouds 1 B.Thirumala Rao, 2 L.S.S.Reddy Department of Computer Science and Engineering, Lakireddy Bali Reddy College

More information

Do You Feel the Lag of Your Hadoop?

Do You Feel the Lag of Your Hadoop? Do You Feel the Lag of Your Hadoop? Yuxuan Jiang, Zhe Huang, and Danny H.K. Tsang Department of Electronic and Computer Engineering The Hong Kong University of Science and Technology, Hong Kong Email:

More information

Apache Hadoop. Alexandru Costan

Apache Hadoop. Alexandru Costan 1 Apache Hadoop Alexandru Costan Big Data Landscape No one-size-fits-all solution: SQL, NoSQL, MapReduce, No standard, except Hadoop 2 Outline What is Hadoop? Who uses it? Architecture HDFS MapReduce Open

More information

A Comprehensive View of Hadoop MapReduce Scheduling Algorithms

A Comprehensive View of Hadoop MapReduce Scheduling Algorithms International Journal of Computer Networks and Communications Security VOL. 2, NO. 9, SEPTEMBER 2014, 308 317 Available online at: www.ijcncs.org ISSN 23089830 C N C S A Comprehensive View of Hadoop MapReduce

More information

Fault Tolerance in Hadoop for Work Migration

Fault Tolerance in Hadoop for Work Migration 1 Fault Tolerance in Hadoop for Work Migration Shivaraman Janakiraman Indiana University Bloomington ABSTRACT Hadoop is a framework that runs applications on large clusters which are built on numerous

More information

The Improved Job Scheduling Algorithm of Hadoop Platform

The Improved Job Scheduling Algorithm of Hadoop Platform The Improved Job Scheduling Algorithm of Hadoop Platform Yingjie Guo a, Linzhi Wu b, Wei Yu c, Bin Wu d, Xiaotian Wang e a,b,c,d,e University of Chinese Academy of Sciences 100408, China b Email: wulinzhi1001@163.com

More information

Reducer Load Balancing and Lazy Initialization in Map Reduce Environment S.Mohanapriya, P.Natesan

Reducer Load Balancing and Lazy Initialization in Map Reduce Environment S.Mohanapriya, P.Natesan Reducer Load Balancing and Lazy Initialization in Map Reduce Environment S.Mohanapriya, P.Natesan Abstract Big Data is revolutionizing 21st-century with increasingly huge amounts of data to store and be

More information

Research on Job Scheduling Algorithm in Hadoop

Research on Job Scheduling Algorithm in Hadoop Journal of Computational Information Systems 7: 6 () 5769-5775 Available at http://www.jofcis.com Research on Job Scheduling Algorithm in Hadoop Yang XIA, Lei WANG, Qiang ZHAO, Gongxuan ZHANG School of

More information

R.K.Uskenbayeva 1, А.А. Kuandykov 2, Zh.B.Kalpeyeva 3, D.K.Kozhamzharova 4, N.K.Mukhazhanov 5

R.K.Uskenbayeva 1, А.А. Kuandykov 2, Zh.B.Kalpeyeva 3, D.K.Kozhamzharova 4, N.K.Mukhazhanov 5 Distributed data processing in heterogeneous cloud environments R.K.Uskenbayeva 1, А.А. Kuandykov 2, Zh.B.Kalpeyeva 3, D.K.Kozhamzharova 4, N.K.Mukhazhanov 5 1 uskenbaevar@gmail.com, 2 abu.kuandykov@gmail.com,

More information

Keywords: Big Data, HDFS, Map Reduce, Hadoop

Keywords: Big Data, HDFS, Map Reduce, Hadoop Volume 5, Issue 7, July 2015 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Configuration Tuning

More information

Resource Scalability for Efficient Parallel Processing in Cloud

Resource Scalability for Efficient Parallel Processing in Cloud Resource Scalability for Efficient Parallel Processing in Cloud ABSTRACT Govinda.K #1, Abirami.M #2, Divya Mercy Silva.J #3 #1 SCSE, VIT University #2 SITE, VIT University #3 SITE, VIT University In the

More information

Chapter 7. Using Hadoop Cluster and MapReduce

Chapter 7. Using Hadoop Cluster and MapReduce Chapter 7 Using Hadoop Cluster and MapReduce Modeling and Prototyping of RMS for QoS Oriented Grid Page 152 7. Using Hadoop Cluster and MapReduce for Big Data Problems The size of the databases used in

More information

6. How MapReduce Works. Jari-Pekka Voutilainen

6. How MapReduce Works. Jari-Pekka Voutilainen 6. How MapReduce Works Jari-Pekka Voutilainen MapReduce Implementations Apache Hadoop has 2 implementations of MapReduce: Classic MapReduce (MapReduce 1) YARN (MapReduce 2) Classic MapReduce The Client

More information

Hadoop Cluster Applications

Hadoop Cluster Applications Hadoop Overview Data analytics has become a key element of the business decision process over the last decade. Classic reporting on a dataset stored in a database was sufficient until recently, but yesterday

More information

Enhancing Dataset Processing in Hadoop YARN Performance for Big Data Applications

Enhancing Dataset Processing in Hadoop YARN Performance for Big Data Applications Enhancing Dataset Processing in Hadoop YARN Performance for Big Data Applications Ahmed Abdulhakim Al-Absi, Dae-Ki Kang and Myong-Jong Kim Abstract In Hadoop MapReduce distributed file system, as the input

More information

Self Learning Based Optimal Resource Provisioning For Map Reduce Tasks with the Evaluation of Cost Functions

Self Learning Based Optimal Resource Provisioning For Map Reduce Tasks with the Evaluation of Cost Functions Self Learning Based Optimal Resource Provisioning For Map Reduce Tasks with the Evaluation of Cost Functions Nithya.M, Damodharan.P M.E Dept. of CSE, Akshaya College of Engineering and Technology, Coimbatore,

More information

A STUDY ON HADOOP ARCHITECTURE FOR BIG DATA ANALYTICS

A STUDY ON HADOOP ARCHITECTURE FOR BIG DATA ANALYTICS A STUDY ON HADOOP ARCHITECTURE FOR BIG DATA ANALYTICS Dr. Ananthi Sheshasayee 1, J V N Lakshmi 2 1 Head Department of Computer Science & Research, Quaid-E-Millath Govt College for Women, Chennai, (India)

More information

A Study on Workload Imbalance Issues in Data Intensive Distributed Computing

A Study on Workload Imbalance Issues in Data Intensive Distributed Computing A Study on Workload Imbalance Issues in Data Intensive Distributed Computing Sven Groot 1, Kazuo Goda 1, and Masaru Kitsuregawa 1 University of Tokyo, 4-6-1 Komaba, Meguro-ku, Tokyo 153-8505, Japan Abstract.

More information

Analysing Large Web Log Files in a Hadoop Distributed Cluster Environment

Analysing Large Web Log Files in a Hadoop Distributed Cluster Environment Analysing Large Files in a Hadoop Distributed Cluster Environment S Saravanan, B Uma Maheswari Department of Computer Science and Engineering, Amrita School of Engineering, Amrita Vishwa Vidyapeetham,

More information

IMPROVED FAIR SCHEDULING ALGORITHM FOR TASKTRACKER IN HADOOP MAP-REDUCE

IMPROVED FAIR SCHEDULING ALGORITHM FOR TASKTRACKER IN HADOOP MAP-REDUCE IMPROVED FAIR SCHEDULING ALGORITHM FOR TASKTRACKER IN HADOOP MAP-REDUCE Mr. Santhosh S 1, Mr. Hemanth Kumar G 2 1 PG Scholor, 2 Asst. Professor, Dept. Of Computer Science & Engg, NMAMIT, (India) ABSTRACT

More information

Associate Professor, Department of CSE, Shri Vishnu Engineering College for Women, Andhra Pradesh, India 2

Associate Professor, Department of CSE, Shri Vishnu Engineering College for Women, Andhra Pradesh, India 2 Volume 6, Issue 3, March 2016 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Special Issue

More information

Hadoop Architecture. Part 1

Hadoop Architecture. Part 1 Hadoop Architecture Part 1 Node, Rack and Cluster: A node is simply a computer, typically non-enterprise, commodity hardware for nodes that contain data. Consider we have Node 1.Then we can add more nodes,

More information

Big Data. White Paper. Big Data Executive Overview WP-BD-10312014-01. Jafar Shunnar & Dan Raver. Page 1 Last Updated 11-10-2014

Big Data. White Paper. Big Data Executive Overview WP-BD-10312014-01. Jafar Shunnar & Dan Raver. Page 1 Last Updated 11-10-2014 White Paper Big Data Executive Overview WP-BD-10312014-01 By Jafar Shunnar & Dan Raver Page 1 Last Updated 11-10-2014 Table of Contents Section 01 Big Data Facts Page 3-4 Section 02 What is Big Data? Page

More information

Reference Architecture and Best Practices for Virtualizing Hadoop Workloads Justin Murray VMware

Reference Architecture and Best Practices for Virtualizing Hadoop Workloads Justin Murray VMware Reference Architecture and Best Practices for Virtualizing Hadoop Workloads Justin Murray ware 2 Agenda The Hadoop Journey Why Virtualize Hadoop? Elasticity and Scalability Performance Tests Storage Reference

More information

Survey on Load Rebalancing for Distributed File System in Cloud

Survey on Load Rebalancing for Distributed File System in Cloud Survey on Load Rebalancing for Distributed File System in Cloud Prof. Pranalini S. Ketkar Ankita Bhimrao Patkure IT Department, DCOER, PG Scholar, Computer Department DCOER, Pune University Pune university

More information

A Review on Load Balancing Algorithms in Cloud

A Review on Load Balancing Algorithms in Cloud A Review on Load Balancing Algorithms in Cloud Hareesh M J Dept. of CSE, RSET, Kochi hareeshmjoseph@ gmail.com John P Martin Dept. of CSE, RSET, Kochi johnpm12@gmail.com Yedhu Sastri Dept. of IT, RSET,

More information

Energy Efficient MapReduce

Energy Efficient MapReduce Energy Efficient MapReduce Motivation: Energy consumption is an important aspect of datacenters efficiency, the total power consumption in the united states has doubled from 2000 to 2005, representing

More information

http://www.wordle.net/

http://www.wordle.net/ Hadoop & MapReduce http://www.wordle.net/ http://www.wordle.net/ Hadoop is an open-source software framework (or platform) for Reliable + Scalable + Distributed Storage/Computational unit Failures completely

More information

GraySort on Apache Spark by Databricks

GraySort on Apache Spark by Databricks GraySort on Apache Spark by Databricks Reynold Xin, Parviz Deyhim, Ali Ghodsi, Xiangrui Meng, Matei Zaharia Databricks Inc. Apache Spark Sorting in Spark Overview Sorting Within a Partition Range Partitioner

More information

SCHEDULING IN CLOUD COMPUTING

SCHEDULING IN CLOUD COMPUTING SCHEDULING IN CLOUD COMPUTING Lipsa Tripathy, Rasmi Ranjan Patra CSA,CPGS,OUAT,Bhubaneswar,Odisha Abstract Cloud computing is an emerging technology. It process huge amount of data so scheduling mechanism

More information

Introduction to Apache YARN Schedulers & Queues

Introduction to Apache YARN Schedulers & Queues Introduction to Apache YARN Schedulers & Queues In a nutshell, YARN was designed to address the many limitations (performance/scalability) embedded into Hadoop version 1 (MapReduce & HDFS). Some of the

More information

CSE-E5430 Scalable Cloud Computing Lecture 2

CSE-E5430 Scalable Cloud Computing Lecture 2 CSE-E5430 Scalable Cloud Computing Lecture 2 Keijo Heljanko Department of Computer Science School of Science Aalto University keijo.heljanko@aalto.fi 14.9-2015 1/36 Google MapReduce A scalable batch processing

More information

Scalable Cloud Computing Solutions for Next Generation Sequencing Data

Scalable Cloud Computing Solutions for Next Generation Sequencing Data Scalable Cloud Computing Solutions for Next Generation Sequencing Data Matti Niemenmaa 1, Aleksi Kallio 2, André Schumacher 1, Petri Klemelä 2, Eija Korpelainen 2, and Keijo Heljanko 1 1 Department of

More information

Task Scheduling Algorithm for Map Reduce To Control Load Balancing In Big Data

Task Scheduling Algorithm for Map Reduce To Control Load Balancing In Big Data Task Scheduling Algorithm for Map Reduce To Control Load Balancing In Big Data Ms.N.Saranya, M.E., (CSE), Jay Shriram Group of Institutions, Tirupur. charanyaa19@gmail.com Abstract- Load balancing is biggest

More information

Big Data With Hadoop

Big Data With Hadoop With Saurabh Singh singh.903@osu.edu The Ohio State University February 11, 2016 Overview 1 2 3 Requirements Ecosystem Resilient Distributed Datasets (RDDs) Example Code vs Mapreduce 4 5 Source: [Tutorials

More information

Hadoop. MPDL-Frühstück 9. Dezember 2013 MPDL INTERN

Hadoop. MPDL-Frühstück 9. Dezember 2013 MPDL INTERN Hadoop MPDL-Frühstück 9. Dezember 2013 MPDL INTERN Understanding Hadoop Understanding Hadoop What's Hadoop about? Apache Hadoop project (started 2008) downloadable open-source software library (current

More information

Comparision of k-means and k-medoids Clustering Algorithms for Big Data Using MapReduce Techniques

Comparision of k-means and k-medoids Clustering Algorithms for Big Data Using MapReduce Techniques Comparision of k-means and k-medoids Clustering Algorithms for Big Data Using MapReduce Techniques Subhashree K 1, Prakash P S 2 1 Student, Kongu Engineering College, Perundurai, Erode 2 Assistant Professor,

More information

CURTAIL THE EXPENDITURE OF BIG DATA PROCESSING USING MIXED INTEGER NON-LINEAR PROGRAMMING

CURTAIL THE EXPENDITURE OF BIG DATA PROCESSING USING MIXED INTEGER NON-LINEAR PROGRAMMING Journal homepage: http://www.journalijar.com INTERNATIONAL JOURNAL OF ADVANCED RESEARCH RESEARCH ARTICLE CURTAIL THE EXPENDITURE OF BIG DATA PROCESSING USING MIXED INTEGER NON-LINEAR PROGRAMMING R.Kohila

More information

Cloud Computing based on the Hadoop Platform

Cloud Computing based on the Hadoop Platform Cloud Computing based on the Hadoop Platform Harshita Pandey 1 UG, Department of Information Technology RKGITW, Ghaziabad ABSTRACT In the recent years,cloud computing has come forth as the new IT paradigm.

More information

Introduction to Cloud Computing

Introduction to Cloud Computing Introduction to Cloud Computing Cloud Computing I (intro) 15 319, spring 2010 2 nd Lecture, Jan 14 th Majd F. Sakr Lecture Motivation General overview on cloud computing What is cloud computing Services

More information

A Locality Enhanced Scheduling Method for Multiple MapReduce Jobs In a Workflow Application

A Locality Enhanced Scheduling Method for Multiple MapReduce Jobs In a Workflow Application 2012 International Conference on Information and Computer Applications (ICICA 2012) IPCSIT vol. 24 (2012) (2012) IACSIT Press, Singapore A Locality Enhanced Scheduling Method for Multiple MapReduce Jobs

More information

Volume 3, Issue 6, June 2015 International Journal of Advance Research in Computer Science and Management Studies

Volume 3, Issue 6, June 2015 International Journal of Advance Research in Computer Science and Management Studies Volume 3, Issue 6, June 2015 International Journal of Advance Research in Computer Science and Management Studies Research Article / Survey Paper / Case Study Available online at: www.ijarcsms.com Image

More information

Dynamic processing slots scheduling for I/O intensive jobs of Hadoop MapReduce

Dynamic processing slots scheduling for I/O intensive jobs of Hadoop MapReduce 2012 Third International Conference on Networking and Computing Dynamic processing slots scheduling for I/O intensive jobs of Hadoop MapReduce Shiori KURAZUMI, Tomoaki TSUMURA, Shoichi SAITO and Hiroshi

More information

An efficient Mapreduce scheduling algorithm in hadoop R.Thangaselvi 1, S.Ananthbabu 2, R.Aruna 3

An efficient Mapreduce scheduling algorithm in hadoop R.Thangaselvi 1, S.Ananthbabu 2, R.Aruna 3 An efficient Mapreduce scheduling algorithm in hadoop R.Thangaselvi 1, S.Ananthbabu 2, R.Aruna 3 1 M.E: Department of Computer Science, VV College of Engineering, Tirunelveli, India 2 Assistant Professor,

More information

Network-Aware Scheduling of MapReduce Framework on Distributed Clusters over High Speed Networks

Network-Aware Scheduling of MapReduce Framework on Distributed Clusters over High Speed Networks Network-Aware Scheduling of MapReduce Framework on Distributed Clusters over High Speed Networks Praveenkumar Kondikoppa, Chui-Hui Chiu, Cheng Cui, Lin Xue and Seung-Jong Park Department of Computer Science,

More information

Hadoop: A Framework for Data- Intensive Distributed Computing. CS561-Spring 2012 WPI, Mohamed Y. Eltabakh

Hadoop: A Framework for Data- Intensive Distributed Computing. CS561-Spring 2012 WPI, Mohamed Y. Eltabakh 1 Hadoop: A Framework for Data- Intensive Distributed Computing CS561-Spring 2012 WPI, Mohamed Y. Eltabakh 2 What is Hadoop? Hadoop is a software framework for distributed processing of large datasets

More information

Locality and Network-Aware Reduce Task Scheduling for Data-Intensive Applications

Locality and Network-Aware Reduce Task Scheduling for Data-Intensive Applications Locality and Network-Aware Reduce Task Scheduling for Data-Intensive Applications Engin Arslan University at Buffalo (SUNY) enginars@buffalo.edu Mrigank Shekhar Tevfik Kosar Intel Corporation University

More information

Welcome to the unit of Hadoop Fundamentals on Hadoop architecture. I will begin with a terminology review and then cover the major components

Welcome to the unit of Hadoop Fundamentals on Hadoop architecture. I will begin with a terminology review and then cover the major components Welcome to the unit of Hadoop Fundamentals on Hadoop architecture. I will begin with a terminology review and then cover the major components of Hadoop. We will see what types of nodes can exist in a Hadoop

More information

Introduction to Hadoop. New York Oracle User Group Vikas Sawhney

Introduction to Hadoop. New York Oracle User Group Vikas Sawhney Introduction to Hadoop New York Oracle User Group Vikas Sawhney GENERAL AGENDA Driving Factors behind BIG-DATA NOSQL Database 2014 Database Landscape Hadoop Architecture Map/Reduce Hadoop Eco-system Hadoop

More information

Guidelines for Selecting Hadoop Schedulers based on System Heterogeneity

Guidelines for Selecting Hadoop Schedulers based on System Heterogeneity Noname manuscript No. (will be inserted by the editor) Guidelines for Selecting Hadoop Schedulers based on System Heterogeneity Aysan Rasooli Douglas G. Down Received: date / Accepted: date Abstract Hadoop

More information

Apache Hama Design Document v0.6

Apache Hama Design Document v0.6 Apache Hama Design Document v0.6 Introduction Hama Architecture BSPMaster GroomServer Zookeeper BSP Task Execution Job Submission Job and Task Scheduling Task Execution Lifecycle Synchronization Fault

More information

Big Data and Hadoop. Sreedhar C, Dr. D. Kavitha, K. Asha Rani

Big Data and Hadoop. Sreedhar C, Dr. D. Kavitha, K. Asha Rani Big Data and Hadoop Sreedhar C, Dr. D. Kavitha, K. Asha Rani Abstract Big data has become a buzzword in the recent years. Big data is used to describe a massive volume of both structured and unstructured

More information

Big Application Execution on Cloud using Hadoop Distributed File System

Big Application Execution on Cloud using Hadoop Distributed File System Big Application Execution on Cloud using Hadoop Distributed File System Ashkan Vates*, Upendra, Muwafaq Rahi Ali RPIIT Campus, Bastara Karnal, Haryana, India ---------------------------------------------------------------------***---------------------------------------------------------------------

More information

Large scale processing using Hadoop. Ján Vaňo

Large scale processing using Hadoop. Ján Vaňo Large scale processing using Hadoop Ján Vaňo What is Hadoop? Software platform that lets one easily write and run applications that process vast amounts of data Includes: MapReduce offline computing engine

More information

Mobile Storage and Search Engine of Information Oriented to Food Cloud

Mobile Storage and Search Engine of Information Oriented to Food Cloud Advance Journal of Food Science and Technology 5(10): 1331-1336, 2013 ISSN: 2042-4868; e-issn: 2042-4876 Maxwell Scientific Organization, 2013 Submitted: May 29, 2013 Accepted: July 04, 2013 Published:

More information

Analysis and Modeling of MapReduce s Performance on Hadoop YARN

Analysis and Modeling of MapReduce s Performance on Hadoop YARN Analysis and Modeling of MapReduce s Performance on Hadoop YARN Qiuyi Tang Dept. of Mathematics and Computer Science Denison University tang_j3@denison.edu Dr. Thomas C. Bressoud Dept. of Mathematics and

More information

Big Data Analysis and Its Scheduling Policy Hadoop

Big Data Analysis and Its Scheduling Policy Hadoop IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661,p-ISSN: 2278-8727, Volume 17, Issue 1, Ver. IV (Jan Feb. 2015), PP 36-40 www.iosrjournals.org Big Data Analysis and Its Scheduling Policy

More information

marlabs driving digital agility WHITEPAPER Big Data and Hadoop

marlabs driving digital agility WHITEPAPER Big Data and Hadoop marlabs driving digital agility WHITEPAPER Big Data and Hadoop Abstract This paper explains the significance of Hadoop, an emerging yet rapidly growing technology. The prime goal of this paper is to unveil

More information

CDBMS Physical Layer issue: Load Balancing

CDBMS Physical Layer issue: Load Balancing CDBMS Physical Layer issue: Load Balancing Shweta Mongia CSE, School of Engineering G D Goenka University, Sohna Shweta.mongia@gdgoenka.ac.in Shipra Kataria CSE, School of Engineering G D Goenka University,

More information

Hadoop implementation of MapReduce computational model. Ján Vaňo

Hadoop implementation of MapReduce computational model. Ján Vaňo Hadoop implementation of MapReduce computational model Ján Vaňo What is MapReduce? A computational model published in a paper by Google in 2004 Based on distributed computation Complements Google s distributed

More information

Log Mining Based on Hadoop s Map and Reduce Technique

Log Mining Based on Hadoop s Map and Reduce Technique Log Mining Based on Hadoop s Map and Reduce Technique ABSTRACT: Anuja Pandit Department of Computer Science, anujapandit25@gmail.com Amruta Deshpande Department of Computer Science, amrutadeshpande1991@gmail.com

More information

ISSN: 2321-7782 (Online) Volume 3, Issue 6, June 2015 International Journal of Advance Research in Computer Science and Management Studies

ISSN: 2321-7782 (Online) Volume 3, Issue 6, June 2015 International Journal of Advance Research in Computer Science and Management Studies ISSN: 2321-7782 (Online) Volume 3, Issue 6, June 2015 International Journal of Advance Research in Computer Science and Management Studies Research Article / Survey Paper / Case Study Available online

More information

A SURVEY ON MAPREDUCE IN CLOUD COMPUTING

A SURVEY ON MAPREDUCE IN CLOUD COMPUTING A SURVEY ON MAPREDUCE IN CLOUD COMPUTING Dr.M.Newlin Rajkumar 1, S.Balachandar 2, Dr.V.Venkatesakumar 3, T.Mahadevan 4 1 Asst. Prof, Dept. of CSE,Anna University Regional Centre, Coimbatore, newlin_rajkumar@yahoo.co.in

More information

Virtual Machine Based Resource Allocation For Cloud Computing Environment

Virtual Machine Based Resource Allocation For Cloud Computing Environment Virtual Machine Based Resource Allocation For Cloud Computing Environment D.Udaya Sree M.Tech (CSE) Department Of CSE SVCET,Chittoor. Andra Pradesh, India Dr.J.Janet Head of Department Department of CSE

More information

Matchmaking: A New MapReduce Scheduling Technique

Matchmaking: A New MapReduce Scheduling Technique Matchmaking: A New MapReduce Scheduling Technique Chen He Ying Lu David Swanson Department of Computer Science and Engineering University of Nebraska-Lincoln Lincoln, U.S. {che,ylu,dswanson}@cse.unl.edu

More information

Exploring the Efficiency of Big Data Processing with Hadoop MapReduce

Exploring the Efficiency of Big Data Processing with Hadoop MapReduce Exploring the Efficiency of Big Data Processing with Hadoop MapReduce Brian Ye, Anders Ye School of Computer Science and Communication (CSC), Royal Institute of Technology KTH, Stockholm, Sweden Abstract.

More information

PIKACHU: How to Rebalance Load in Optimizing MapReduce On Heterogeneous Clusters

PIKACHU: How to Rebalance Load in Optimizing MapReduce On Heterogeneous Clusters PIKACHU: How to Rebalance Load in Optimizing MapReduce On Heterogeneous Clusters Rohan Gandhi, Di Xie, Y. Charlie Hu Purdue University Abstract For power, cost, and pricing reasons, datacenters are evolving

More information

Load Rebalancing for File System in Public Cloud Roopa R.L 1, Jyothi Patil 2

Load Rebalancing for File System in Public Cloud Roopa R.L 1, Jyothi Patil 2 Load Rebalancing for File System in Public Cloud Roopa R.L 1, Jyothi Patil 2 1 PDA College of Engineering, Gulbarga, Karnataka, India rlrooparl@gmail.com 2 PDA College of Engineering, Gulbarga, Karnataka,

More information

Non-intrusive Slot Layering in Hadoop

Non-intrusive Slot Layering in Hadoop 213 13th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing Non-intrusive Layering in Hadoop Peng Lu, Young Choon Lee, Albert Y. Zomaya Center for Distributed and High Performance Computing,

More information

An Approach to Load Balancing In Cloud Computing

An Approach to Load Balancing In Cloud Computing An Approach to Load Balancing In Cloud Computing Radha Ramani Malladi Visiting Faculty, Martins Academy, Bangalore, India ABSTRACT: Cloud computing is a structured model that defines computing services,

More information

The Hadoop Framework

The Hadoop Framework The Hadoop Framework Nils Braden University of Applied Sciences Gießen-Friedberg Wiesenstraße 14 35390 Gießen nils.braden@mni.fh-giessen.de Abstract. The Hadoop Framework offers an approach to large-scale

More information

MapReduce (in the cloud)

MapReduce (in the cloud) MapReduce (in the cloud) How to painlessly process terabytes of data by Irina Gordei MapReduce Presentation Outline What is MapReduce? Example How it works MapReduce in the cloud Conclusion Demo Motivation:

More information

Heterogeneity-Aware Resource Allocation and Scheduling in the Cloud

Heterogeneity-Aware Resource Allocation and Scheduling in the Cloud Heterogeneity-Aware Resource Allocation and Scheduling in the Cloud Gunho Lee, Byung-Gon Chun, Randy H. Katz University of California, Berkeley, Yahoo! Research Abstract Data analytics are key applications

More information

The Comprehensive Performance Rating for Hadoop Clusters on Cloud Computing Platform

The Comprehensive Performance Rating for Hadoop Clusters on Cloud Computing Platform The Comprehensive Performance Rating for Hadoop Clusters on Cloud Computing Platform Fong-Hao Liu, Ya-Ruei Liou, Hsiang-Fu Lo, Ko-Chin Chang, and Wei-Tsong Lee Abstract Virtualization platform solutions

More information

Optimization and analysis of large scale data sorting algorithm based on Hadoop

Optimization and analysis of large scale data sorting algorithm based on Hadoop Optimization and analysis of large scale sorting algorithm based on Hadoop Zhuo Wang, Longlong Tian, Dianjie Guo, Xiaoming Jiang Institute of Information Engineering, Chinese Academy of Sciences {wangzhuo,

More information

Processing of Hadoop using Highly Available NameNode

Processing of Hadoop using Highly Available NameNode Processing of Hadoop using Highly Available NameNode 1 Akash Deshpande, 2 Shrikant Badwaik, 3 Sailee Nalawade, 4 Anjali Bote, 5 Prof. S. P. Kosbatwar Department of computer Engineering Smt. Kashibai Navale

More information

The International Journal Of Science & Technoledge (ISSN 2321 919X) www.theijst.com

The International Journal Of Science & Technoledge (ISSN 2321 919X) www.theijst.com THE INTERNATIONAL JOURNAL OF SCIENCE & TECHNOLEDGE Efficient Parallel Processing on Public Cloud Servers using Load Balancing Manjunath K. C. M.Tech IV Sem, Department of CSE, SEA College of Engineering

More information

Map-Parallel Scheduling (mps) using Hadoop environment for job scheduler and time span for Multicore Processors

Map-Parallel Scheduling (mps) using Hadoop environment for job scheduler and time span for Multicore Processors Map-Parallel Scheduling (mps) using Hadoop environment for job scheduler and time span for Sudarsanam P Abstract G. Singaravel Parallel computing is an base mechanism for data process with scheduling task,

More information

Survey on Job Schedulers in Hadoop Cluster

Survey on Job Schedulers in Hadoop Cluster IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661, p- ISSN: 2278-8727Volume 15, Issue 1 (Sep. - Oct. 2013), PP 46-50 Bincy P Andrews 1, Binu A 2 1 (Rajagiri School of Engineering and Technology,

More information

Dynamic Workload Management in Heterogeneous Cloud Computing Environments

Dynamic Workload Management in Heterogeneous Cloud Computing Environments Dynamic Workload Management in Heterogeneous Cloud Computing Environments Qi Zhang and Raouf Boutaba University of Waterloo IEEE/IFIP Network Operations and Management Symposium Krakow, Poland May 7, 2014

More information

Adaptive Task Scheduling for MultiJob MapReduce Environments

Adaptive Task Scheduling for MultiJob MapReduce Environments Adaptive Task Scheduling for MultiJob MapReduce Environments Jordà Polo, David de Nadal, David Carrera, Yolanda Becerra, Vicenç Beltran, Jordi Torres and Eduard Ayguadé Barcelona Supercomputing Center

More information

CSE 590: Special Topics Course ( Supercomputing ) Lecture 10 ( MapReduce& Hadoop)

CSE 590: Special Topics Course ( Supercomputing ) Lecture 10 ( MapReduce& Hadoop) CSE 590: Special Topics Course ( Supercomputing ) Lecture 10 ( MapReduce& Hadoop) Rezaul A. Chowdhury Department of Computer Science SUNY Stony Brook Spring 2016 MapReduce MapReduce is a programming model

More information

Telecom Data processing and analysis based on Hadoop

Telecom Data processing and analysis based on Hadoop COMPUTER MODELLING & NEW TECHNOLOGIES 214 18(12B) 658-664 Abstract Telecom Data processing and analysis based on Hadoop Guofan Lu, Qingnian Zhang *, Zhao Chen Wuhan University of Technology, Wuhan 4363,China

More information

Hadoop Design and k-means Clustering

Hadoop Design and k-means Clustering Hadoop Design and k-means Clustering Kenneth Heafield Google Inc January 15, 2008 Example code from Hadoop 0.13.1 used under the Apache License Version 2.0 and modified for presentation. Except as otherwise

More information

Efficient Data Replication Scheme based on Hadoop Distributed File System

Efficient Data Replication Scheme based on Hadoop Distributed File System , pp. 177-186 http://dx.doi.org/10.14257/ijseia.2015.9.12.16 Efficient Data Replication Scheme based on Hadoop Distributed File System Jungha Lee 1, Jaehwa Chung 2 and Daewon Lee 3* 1 Division of Supercomputing,

More information

MapReduce and Hadoop Distributed File System

MapReduce and Hadoop Distributed File System MapReduce and Hadoop Distributed File System 1 B. RAMAMURTHY Contact: Dr. Bina Ramamurthy CSE Department University at Buffalo (SUNY) bina@buffalo.edu http://www.cse.buffalo.edu/faculty/bina Partially

More information