A REAL TIME MEMORY SLOT UTILIZATION DESIGN FOR MAPREDUCE MEMORY CLUSTERS
|
|
- Ashley Merritt
- 8 years ago
- Views:
Transcription
1 A REAL TIME MEMORY SLOT UTILIZATION DESIGN FOR MAPREDUCE MEMORY CLUSTERS Suma R 1, Vinay T R 2, Byre Gowda B K 3 1 Post graduate Student, CSE, SVCE, Bangalore 2 Assistant Professor, CSE, SVCE, Bangalore 3 Assistant professor, CSE, SIR.MVIT, Bangalore ABSTRACT For large data processing in the cloud map reduce is a process we can split the data into multiple parts or make it into the slot and then process and mapping process will happen. The slot based map reduce is not too effective it gives the poor performance because of the unoptimized resource allocation and they have the various challenges. The map reduce job task execution have the two unique feature. The map slot allocation only allocate the map task and reduce task only be allocated to reduce task and the map task process before the reduce task. The data locality maximization for the efficiency and utilization is required to improve the quality of the system proposed the various challenge to address this problem. The DynamicMR is a dynamic slot allocation framework to improve the performance of map reduce[1]. The DynamicMR focuses on Hadoop fair scheduler (HFS). The Dynamic scheduler consist of three optimization techniques Dynamic Hadoop slot Allocation (DHSA), Speculative Execution performance Balancing(SEPB) and Slot Prescheduling. 1. INTRODUCTION Big data is a collection of both structured and unstructured data that is too large, fast and distinct to be managed by traditional database management tools or traditional data processing applications. Hadoop is an open-source software framework from Apache that supports scalable distributed applications. Hadoop supports running applications on large clusters of commodity hardware and provide fast and reliable analysis of both structured and unstructured data. Hadoop uses simple programming model. Hadoop can scale from single servers to thousands of machines, each offering local computation and storage. Despite many studies in optimizing MapReduce/Hadoop, there are many key challenges for the utilization and performance improvement of a Hadoop cluster. Firstly, the resources (e.g., CPU cores) are abstracted as map and reduce slots. A MapReduce job execution has two unique features: 1. The slot allocation constraint assumption that map slots are allocated to map tasks and reduce slots are allocated to reduce tasks, and 2. The map task are executed first and then the reduce task. We have 2 observation here 1. For different slot configuration there are different system utilization and performance for a mapreduce workload. 2. Idle reduce slots which affects the performance and system utilization. Secondly due to straggler problem of map or reduce task,delay of the whole job occurs. Thirdly,by maximizing the data locality performance and slot utilization efficiency improvement occurs in mapreduce workload. DynamicMR have 3 techniques here,dynamic Hadoop Slot Allocation(DHSA),Speculative Execution Performance Balancing(SEPB),Slot Prescheduling(SP). DHSA 1. Slots can be used for map task or reduce task,in map slots if there are insufficient slots for map task it can borrow unused slots from reduce slots. Similarly reduce slots can also borrow slots from map slots if the reduce task is greater than reduce slots. 2. Map slots are used for map task and reduce slots are used for reduce task.
2 SEPB It is used to find out the slow running task.so we propose a technique called speculative execution performance balancing for this task which is speculative.by this it can balance performance tradeoff b/w a single job and a batch of job execution time. Slot Prescheduling Approach for improving data locality in mapreduce. So, it is at the cost of fairness. DynamicMR improve performance and utilization of mapreduce workload with 46% -115% for single job and 49% - 112% for multiple job. 1. Slot Utilization Optimization 2. Utilization Efficiency Optimization PI-DHSA PD-DHSA Reduce Task Map Task Idle Slot Dynamic Hadoop Slot Allocation (DHSA) Speculative Execution Performance Balancing (SEPB) Slot PreScheduling Fig- 1: O verview of DynamicMR Framework Popularity of mapreduce in industry, bioinformatics, machine learning. Implementation of mapreduce is hadoop[6]. Multiple task run in mapreduce in each node. Each node host a configure number of map and reduce slots. When task is assigned to slots it get occupied,and when task completes slot gets released. Resource underutilization overcome by using resource stealing. Speculative execution in mapreduce to support fault tolerance. Progress of all scheduled task maintained by master node. When master node finds a slow running task,a speculative task is launched to process the task fast. To process huge data framework and powerful hardware is required. Google proposed mapreduce for parallel data. Dynamic and aggressive approach is mapreduce. Sometimes fairness and data locality conflict eachother,when fairness is strict data locality degradation occurs and purely data locality result in unfairness of resource usage. Mapreduce is a programming model for large scale data processing[2]. Mapreduce which process 20 petabytes of data per day. Open source implementation of mapreduce is hadoop.example of hadoop is facebook,google etc. In mapreduce it uses a distributed storage layer refered to as Hadoop distributed file system. A job is submitted by user comprising of map function and reduce function which are transformed into map and reduce task respectively. Data is split into equal size by HDFS and distributes data into cluster nodes, mapping is performed in HDFS. Intermediate output are partitioned into one or many reduce task. Locality- Aware Reduce task Scheduler (LARTS) partition of sizes to have data locality. 2. EXISTING SYSTEM Scheduling and resource allocation optimization There are scheduling and allocation of resource for mapreduce jobs. In mapreduce case for 1 job we have multiple task. In the same time all job arrive and minimize the job completion time is objective. To achieve this we develop a computation model to solve large scale data problem and undergo graph analysis. Mapreduce modeled into 2 stage hybrid flow shop. Job submission result in performance improvement of system and utilization. Map and reduce task execution time should be known before, which is not possible in real world application. DHSA can be used for any mapreduce workload. In optimal hadoop configuration eg:in Map/reduce slot configuration,it contain room for improving performance of mapreduce workload. Guo et al propose a method called resource stealing[3] to steal resources which are reserved for idle slots here adopting multi-threading technique for task which is running on multiple CPU cores. Polo et al propose a method called resource aware scheduling technique for map reduce workload, which improve resource utilization.in DHSA we can improve system utilization by allocating unused map and reduce slots. New version of hadoop is YARN.Inefficiency problem of hadoop is overcome by using YARN.Resources are managed here consisting of resources like memory,band width.however for multiple jobs DynamicMR is better than YARN bcz here is YARN there is no concept of slot.
3 Speculative Execution optimization: Use to deal with straggler problem using LATE.Longest Approximate Time to End is algorithm for Speculative Execution which focuses on heterogeneous environment and speculative task are capped. By Guo et al LATE performance is improved by proposing a Benefit Aware Speculative Execution(BASE). Benefit Speculative Task, so we propose SEPB to balance tadeoff b/w single job and batch of job. Data Locality Optimization For efficiency improvement and performance of the cluster utilization by data locality Optimization[4]. In mapreduce we have map side and reduce side. The data locality optimization for mapside is moving the maptask close to the input data blocks. Mapreduce jobs are classified into map-input heavy,map and reduce input heavy and reduce-input heavy. The reduce-side data locality place reduce task to the machines that generate intermediate data by maptask. Mapside data locality belong to slot prescheduling. Extra idle slots is used to maximize data locality and faireness. Delay scheduler and slot prescheduling is used to achieve faireness and data locality. 2 types of slot optimizers SEPB and Slot prescheduling for improvement of DHSA. Mapreduce optimization on cloud computing Fine grained optimization for hadoop is DynamicMR. By combine existing system and DynamicMR together develop framework and budget in cloud computing. 3. PROPOSED SYSTEM Mapreduce performance can be improved from 2 perspective. Firstly slots are classified into busy slot and idle slot. One approach here is to increasing slot utilization by maximizing busy slot and minimizing the idle slots. Second is every busy slot have not been efficiently utilized. Thus our approaches is to improve the utilization of busy slot. DHSA which is used to increase slot utilization and maintaining faireness [4]. SEPB improve slow running task. Slot prescheduling [10]improves performance by data locality and faireness. DynamicMR have the following step-by-step processes: 1 When there is a idle slot, DynamicMR will improve the slot utilization with DHSA. DynamicMR will decide whether to allocate it or not Eg:Faireness. 2 Allocation is true, DynamicMR will improve the efficiency of slot by SEPB. Speculative Execution will achieve performance tradeoff b/w a single job and batch of job. 3 For pending maptask allocate idle slots. DynamicMR will improve efficiency of slot utilization with slot prescheduling. 3.1 Dynamic Hadoop Slot Utilization: Mapreduce current design suffers from under utilization of slots bcz number of map and reduce task varies over time. Where the number of map/reduce task is greater than the map/reduce slots. Reduce task which is overloaded we can use unused map slots by that mapreduce performance is improved. All workload will lie in the map side. So we use idle reduce slots for map task. Map and reduce task can run on either map slots or reduce slots. 1 In HFS faireness is important: When all pools are allocated with equal amount of resources it is a fair. 2 Map slots and reduce slot resource requirement is different[9]. Memory and n/w bandwidth are resources of reduce task. DHSA contain 2 alternatives namely PD-DHSA and PI-DHSA. Pool-Independent DHSA:PI-DHSA process consist of 2 parts: Fig-2: Pool-Independent DHSA
4 1 Intra-phase dynamic slot allocation: Pool is divided into 2 sub pools i.e. map-phase pool and reduce-phase pool. The pool which is overloaded and have slot demand can borrow unused slots from other pool of same phase. Eg: Map phase pool 2 can borrow map slots from map phase pool 1 and pool 3. 2 Inter-phase dynamic slot allocation: When reduce phase contain unused reduce slot and we have insufficient map slots for map task, then it will borrow idle slots from reduce slots. Nm-total number of map task. Nr-total number of reduce task. Sm-total number of map slots. Sr-total number of reduce slots. Case 1: When Nm Sm and Nr Sr map slots run on reduce task and reduce slots run on reduce task i.e slots borrowing does not takes place. Case 2: When Nm > Sm and Nr < Sr reduce slots for reduce task and use idle reduce slots for running map task. Case 3: When Nm < Sm and Nr > Sr, for running reduce task we use unused mapslots. Case 4: When Nm > Sm and Nr > Sr system in busy state, map and reduce slots have no movement. We have 2 variables PercentageOfBorrowed MapSlots and PercentageOfBorrowed ReduceSlots. PD-DHSA: Fig-3: Pool Dependent DHSA 2 pools map-phase pool and reduce-phase pool is selfish. Until the map-phase and reduce-phase satisfy its own shared map and reduce slots before going to other pools. 2 processes: 1 Intra-pool dynamic slot allocation: In this pool we have 4 relationship Case a: Mapslot Demand < Mapshare and Reduceslot Demand > reduce share,borrow unused map slots from reduce phase pool 1 st for its overloaded reduce task. Case b: MapslotsDemand > Mapshare and ReduceSlotsDemand < reduce share, reduce phase contain unused slots to its map task. Case c: MapSlotsDemand Mapshare and reduceslotsdemand reduceshare,mapslots and reduce slots do not borrow any slots. It can give slots to other pools. Case d: MapSlotsDemand > mapshare and reduceslotsdemand > reduceshare.here mapslots and reduceslots are insufficient. Map slots and reduce slots borrow slots from other pools. 2 Inter-pool dynamic slot allocation: MapslotsDemand + ReduceslotsDemand Mapshare + reduceshare in this case no need of borrowing slots from other pools. MapSlotsDemand + ReduceSlotsDemand > mapshare + reduceshare in this case even after Intra-pool dynamic slot allocation slots are not enough. So it will borrow unused slots from other pools. Tasktracker have 4 possible slot allocation.
5 Fig-4: Slot Allocation For PD-DHSA Case 1: Tasktracker if have idle map slots it undergo map tasks allocation and it contain pending task for pool. Case 2: If case 1 fails then Tasktracker if have idle reduce slots it undergo reduce task allocation and it contain pending task for pool. Case 3: If case 1 and case 2 fails then in case 3 for map task we try reduce slots. Case 4: For reduce task we allocate map slots. 3.2 Speculative execution performance balance: Job execution time for mapreduce is very sensitive to slow running task. Stragglers due to faulty hardware and software misconfiguration. Stragglers are 2 types Hard straggler and soft straggler. Hard straggler :A task due to endless waiting for certain resources goes to deadlock status. we should kill the task, because it will not stop. Soft straggler :A task take much longer time than the common task, but the task get successfully complete. Back up task means killing task of Hard straggler and running other task. Straggler problem detected by Late algorithm. Speculation excecution will reduce a job excecution time. Fig-5: TotalnumofPending maptask and totalnumofpending reducetask. In SEPB 1 st the task which is failed given higher priority. 2 nd the task which are pending are considered. LATE which handle straggled task,it will call backup task and allocate a slot. Consider example with 6 jobs. Speculative cap for LATE is 4 and the maxnum of jobs checked for pending taskis 4. Idle slots are 4. SEPB will allocate all 4 idle slot to pending task bcz pending task for j1,j2,j3,j4,j5,j6 are 0,0,10,10,15,20 respectively. On top of LATE, SEPB works and SEPB is enhancement of LATE. 3.3 Slot perscheduling: Which improve data locality[5] and without having negative impact on the faireness of mapreduce jobs. Defn 1: The available idle map slots that can be allocated to the tasktracker. Defn 2: The extra idle map slots are subtracting used map slots and allow available idle map slots. Technique Faireness Slot Utilization Performance
6 DHSA SEPB + + DS _ %(+) + SPS + %(+) + TABLE 1: +, _, % Denotes Benefit,Cost,efficiency respectively. 4. CONCLUSION Improving performance of Mapreduce workload by DynamicMR framework and maintaining faireness.three techniques here are DHSA, SEPB, Slot prescheduling all focus on utilization of slot for mapreduce cluster. Utilization of slot can be maximized by DHSA. Inefficiency of slot is identified by SEPB. Slot prescheduling improves slot utilization efficiency. Combining these techniques improve Hadoop System. REFERENCES [1] Q. Chen, C. Liu, Z. Xiao, Improving MapReduce Performance Using Smart Speculative Execution Strategy. IEEE Transactions on Computer, [2] J. Dean and S. Ghemawat. MapReduce: Simplified Data Processing on Large Clusters, In OSDI 04, pp , [3] Z.H. Guo, G. Fox, M. Zhou, Y. Ruan.Improving Resource Utilization in MapReduce. In IEEE Cluster 12. pp , [4] Z. H. Guo, G. Fox, and M. Zhou.Investigation of data locality and fairness in MapReduce. In MapReduce 12, pp, 25-32, [5] Z. H. Guo, G. Fox, and M. Zhou. Investigation of Data Locality in MapReduce. In IEEE/ACM CCGrid 12, pp, , [6] Hadoop. [7] M. Hammoud and M. F. Sakr. Locality-Aware Reduce Task Scheduling for MapReduce. In IEEE CLOUDCOM 11. pp , [8] M. Hammoud, M. S. Rehman, M. F. Sakr. Center-of-Gravity Reduce Task Scheduling to Lower MapReduce Network Traffic. In IEEE CLOUD 12, pp , [9] B. Palanisamy, A. Singh, L. Liu and B. Jain, Purlieus: Localityaware Resource Allocation for MapReduce in a Cloud, In SC 11, pp. 1-11, [10] J. Polo, C. Castillo, D. Carrera, et al. Resource-aware Adaptive Scheduling for MapReduce Clusters. In Middleware 11, pp ,
DynamicMR: A Dynamic Slot Allocation Optimization Framework for MapReduce Clusters
IEEE TRANSACTIONS ON CLOUD COMPUTING 1 DynamicMR: A Dynamic Slot Allocation Optimization Framework for MapReduce Clusters Shanjiang Tang, Bu-Sung Lee, Bingsheng He Abstract MapReduce is a popular computing
More informationDynMR: A Dynamic Slot Allocation Framework for MapReduce Clusters in Big Data Management using DHSA and SEPB
RESEARCH ARTICLE DynMR: A Dynamic Slot Allocation Framework for MapReduce Clusters in Big Data Management using DHSA and SEPB Anil Sagar T 1, Ramakrishna V Moni 2 1 (Mtech, Dept of CSE, VTU, SaIT, Bangalore
More informationSurvey on Scheduling Algorithm in MapReduce Framework
Survey on Scheduling Algorithm in MapReduce Framework Pravin P. Nimbalkar 1, Devendra P.Gadekar 2 1,2 Department of Computer Engineering, JSPM s Imperial College of Engineering and Research, Pune, India
More informationA REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM
A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM Sneha D.Borkar 1, Prof.Chaitali S.Surtakar 2 Student of B.E., Information Technology, J.D.I.E.T, sborkar95@gmail.com Assistant Professor, Information
More informationHadoop Scheduler w i t h Deadline Constraint
Hadoop Scheduler w i t h Deadline Constraint Geetha J 1, N UdayBhaskar 2, P ChennaReddy 3,Neha Sniha 4 1,4 Department of Computer Science and Engineering, M S Ramaiah Institute of Technology, Bangalore,
More informationMAPREDUCE [1] is proposed by Google in 2004 and
IEEE TRANSACTIONS ON COMPUTERS 1 Improving MapReduce Performance Using Smart Speculative Execution Strategy Qi Chen, Cheng Liu, and Zhen Xiao, Senior Member, IEEE Abstract MapReduce is a widely used parallel
More informationScheduling Algorithms in MapReduce Distributed Mind
Scheduling Algorithms in MapReduce Distributed Mind Karthik Kotian, Jason A Smith, Ye Zhang Schedule Overview of topic (review) Hypothesis Research paper 1 Research paper 2 Research paper 3 Project software
More informationImproving MapReduce Performance in Heterogeneous Environments
UC Berkeley Improving MapReduce Performance in Heterogeneous Environments Matei Zaharia, Andy Konwinski, Anthony Joseph, Randy Katz, Ion Stoica University of California at Berkeley Motivation 1. MapReduce
More informationBig Data and Apache Hadoop s MapReduce
Big Data and Apache Hadoop s MapReduce Michael Hahsler Computer Science and Engineering Southern Methodist University January 23, 2012 Michael Hahsler (SMU/CSE) Hadoop/MapReduce January 23, 2012 1 / 23
More informationScheduling Data Intensive Workloads through Virtualization on MapReduce based Clouds
ABSTRACT Scheduling Data Intensive Workloads through Virtualization on MapReduce based Clouds 1 B.Thirumala Rao, 2 L.S.S.Reddy Department of Computer Science and Engineering, Lakireddy Bali Reddy College
More informationINTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY
INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY A PATH FOR HORIZING YOUR INNOVATIVE WORK A REVIEW ON HIGH PERFORMANCE DATA STORAGE ARCHITECTURE OF BIGDATA USING HDFS MS.
More informationImproving Job Scheduling in Hadoop
Improving Job Scheduling in Hadoop MapReduce Himangi G. Patel, Richard Sonaliya Computer Engineering, Silver Oak College of Engineering and Technology, Ahmedabad, Gujarat, India. Abstract Hadoop is a framework
More informationA Hybrid Scheduling Approach for Scalable Heterogeneous Hadoop Systems
A Hybrid Scheduling Approach for Scalable Heterogeneous Hadoop Systems Aysan Rasooli Department of Computing and Software McMaster University Hamilton, Canada Email: rasooa@mcmaster.ca Douglas G. Down
More informationApache Hadoop. Alexandru Costan
1 Apache Hadoop Alexandru Costan Big Data Landscape No one-size-fits-all solution: SQL, NoSQL, MapReduce, No standard, except Hadoop 2 Outline What is Hadoop? Who uses it? Architecture HDFS MapReduce Open
More informationDo You Feel the Lag of Your Hadoop?
Do You Feel the Lag of Your Hadoop? Yuxuan Jiang, Zhe Huang, and Danny H.K. Tsang Department of Electronic and Computer Engineering The Hong Kong University of Science and Technology, Hong Kong Email:
More informationFault Tolerance in Hadoop for Work Migration
1 Fault Tolerance in Hadoop for Work Migration Shivaraman Janakiraman Indiana University Bloomington ABSTRACT Hadoop is a framework that runs applications on large clusters which are built on numerous
More informationTask Scheduling in Hadoop
Task Scheduling in Hadoop Sagar Mamdapure Munira Ginwala Neha Papat SAE,Kondhwa SAE,Kondhwa SAE,Kondhwa Abstract Hadoop is widely used for storing large datasets and processing them efficiently under distributed
More informationA Comprehensive View of Hadoop MapReduce Scheduling Algorithms
International Journal of Computer Networks and Communications Security VOL. 2, NO. 9, SEPTEMBER 2014, 308 317 Available online at: www.ijcncs.org ISSN 23089830 C N C S A Comprehensive View of Hadoop MapReduce
More informationKeywords: Big Data, HDFS, Map Reduce, Hadoop
Volume 5, Issue 7, July 2015 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Configuration Tuning
More informationThe Improved Job Scheduling Algorithm of Hadoop Platform
The Improved Job Scheduling Algorithm of Hadoop Platform Yingjie Guo a, Linzhi Wu b, Wei Yu c, Bin Wu d, Xiaotian Wang e a,b,c,d,e University of Chinese Academy of Sciences 100408, China b Email: wulinzhi1001@163.com
More informationReducer Load Balancing and Lazy Initialization in Map Reduce Environment S.Mohanapriya, P.Natesan
Reducer Load Balancing and Lazy Initialization in Map Reduce Environment S.Mohanapriya, P.Natesan Abstract Big Data is revolutionizing 21st-century with increasingly huge amounts of data to store and be
More informationR.K.Uskenbayeva 1, А.А. Kuandykov 2, Zh.B.Kalpeyeva 3, D.K.Kozhamzharova 4, N.K.Mukhazhanov 5
Distributed data processing in heterogeneous cloud environments R.K.Uskenbayeva 1, А.А. Kuandykov 2, Zh.B.Kalpeyeva 3, D.K.Kozhamzharova 4, N.K.Mukhazhanov 5 1 uskenbaevar@gmail.com, 2 abu.kuandykov@gmail.com,
More informationResearch on Job Scheduling Algorithm in Hadoop
Journal of Computational Information Systems 7: 6 () 5769-5775 Available at http://www.jofcis.com Research on Job Scheduling Algorithm in Hadoop Yang XIA, Lei WANG, Qiang ZHAO, Gongxuan ZHANG School of
More informationResource Scalability for Efficient Parallel Processing in Cloud
Resource Scalability for Efficient Parallel Processing in Cloud ABSTRACT Govinda.K #1, Abirami.M #2, Divya Mercy Silva.J #3 #1 SCSE, VIT University #2 SITE, VIT University #3 SITE, VIT University In the
More informationEnhancing Dataset Processing in Hadoop YARN Performance for Big Data Applications
Enhancing Dataset Processing in Hadoop YARN Performance for Big Data Applications Ahmed Abdulhakim Al-Absi, Dae-Ki Kang and Myong-Jong Kim Abstract In Hadoop MapReduce distributed file system, as the input
More informationHadoop Architecture. Part 1
Hadoop Architecture Part 1 Node, Rack and Cluster: A node is simply a computer, typically non-enterprise, commodity hardware for nodes that contain data. Consider we have Node 1.Then we can add more nodes,
More informationHadoop Cluster Applications
Hadoop Overview Data analytics has become a key element of the business decision process over the last decade. Classic reporting on a dataset stored in a database was sufficient until recently, but yesterday
More informationChapter 7. Using Hadoop Cluster and MapReduce
Chapter 7 Using Hadoop Cluster and MapReduce Modeling and Prototyping of RMS for QoS Oriented Grid Page 152 7. Using Hadoop Cluster and MapReduce for Big Data Problems The size of the databases used in
More informationGraySort on Apache Spark by Databricks
GraySort on Apache Spark by Databricks Reynold Xin, Parviz Deyhim, Ali Ghodsi, Xiangrui Meng, Matei Zaharia Databricks Inc. Apache Spark Sorting in Spark Overview Sorting Within a Partition Range Partitioner
More informationAnalysing Large Web Log Files in a Hadoop Distributed Cluster Environment
Analysing Large Files in a Hadoop Distributed Cluster Environment S Saravanan, B Uma Maheswari Department of Computer Science and Engineering, Amrita School of Engineering, Amrita Vishwa Vidyapeetham,
More informationIMPROVED FAIR SCHEDULING ALGORITHM FOR TASKTRACKER IN HADOOP MAP-REDUCE
IMPROVED FAIR SCHEDULING ALGORITHM FOR TASKTRACKER IN HADOOP MAP-REDUCE Mr. Santhosh S 1, Mr. Hemanth Kumar G 2 1 PG Scholor, 2 Asst. Professor, Dept. Of Computer Science & Engg, NMAMIT, (India) ABSTRACT
More informationReference Architecture and Best Practices for Virtualizing Hadoop Workloads Justin Murray VMware
Reference Architecture and Best Practices for Virtualizing Hadoop Workloads Justin Murray ware 2 Agenda The Hadoop Journey Why Virtualize Hadoop? Elasticity and Scalability Performance Tests Storage Reference
More information6. How MapReduce Works. Jari-Pekka Voutilainen
6. How MapReduce Works Jari-Pekka Voutilainen MapReduce Implementations Apache Hadoop has 2 implementations of MapReduce: Classic MapReduce (MapReduce 1) YARN (MapReduce 2) Classic MapReduce The Client
More informationSelf Learning Based Optimal Resource Provisioning For Map Reduce Tasks with the Evaluation of Cost Functions
Self Learning Based Optimal Resource Provisioning For Map Reduce Tasks with the Evaluation of Cost Functions Nithya.M, Damodharan.P M.E Dept. of CSE, Akshaya College of Engineering and Technology, Coimbatore,
More informationBig Data. White Paper. Big Data Executive Overview WP-BD-10312014-01. Jafar Shunnar & Dan Raver. Page 1 Last Updated 11-10-2014
White Paper Big Data Executive Overview WP-BD-10312014-01 By Jafar Shunnar & Dan Raver Page 1 Last Updated 11-10-2014 Table of Contents Section 01 Big Data Facts Page 3-4 Section 02 What is Big Data? Page
More informationHow To Balance In Cloud Computing
A Review on Load Balancing Algorithms in Cloud Hareesh M J Dept. of CSE, RSET, Kochi hareeshmjoseph@ gmail.com John P Martin Dept. of CSE, RSET, Kochi johnpm12@gmail.com Yedhu Sastri Dept. of IT, RSET,
More informationA STUDY ON HADOOP ARCHITECTURE FOR BIG DATA ANALYTICS
A STUDY ON HADOOP ARCHITECTURE FOR BIG DATA ANALYTICS Dr. Ananthi Sheshasayee 1, J V N Lakshmi 2 1 Head Department of Computer Science & Research, Quaid-E-Millath Govt College for Women, Chennai, (India)
More informationEnergy Efficient MapReduce
Energy Efficient MapReduce Motivation: Energy consumption is an important aspect of datacenters efficiency, the total power consumption in the united states has doubled from 2000 to 2005, representing
More informationSurvey on Load Rebalancing for Distributed File System in Cloud
Survey on Load Rebalancing for Distributed File System in Cloud Prof. Pranalini S. Ketkar Ankita Bhimrao Patkure IT Department, DCOER, PG Scholar, Computer Department DCOER, Pune University Pune university
More informationhttp://www.wordle.net/
Hadoop & MapReduce http://www.wordle.net/ http://www.wordle.net/ Hadoop is an open-source software framework (or platform) for Reliable + Scalable + Distributed Storage/Computational unit Failures completely
More informationA Study on Workload Imbalance Issues in Data Intensive Distributed Computing
A Study on Workload Imbalance Issues in Data Intensive Distributed Computing Sven Groot 1, Kazuo Goda 1, and Masaru Kitsuregawa 1 University of Tokyo, 4-6-1 Komaba, Meguro-ku, Tokyo 153-8505, Japan Abstract.
More informationAssociate Professor, Department of CSE, Shri Vishnu Engineering College for Women, Andhra Pradesh, India 2
Volume 6, Issue 3, March 2016 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Special Issue
More informationCloud Computing based on the Hadoop Platform
Cloud Computing based on the Hadoop Platform Harshita Pandey 1 UG, Department of Information Technology RKGITW, Ghaziabad ABSTRACT In the recent years,cloud computing has come forth as the new IT paradigm.
More informationNetwork-Aware Scheduling of MapReduce Framework on Distributed Clusters over High Speed Networks
Network-Aware Scheduling of MapReduce Framework on Distributed Clusters over High Speed Networks Praveenkumar Kondikoppa, Chui-Hui Chiu, Cheng Cui, Lin Xue and Seung-Jong Park Department of Computer Science,
More informationBig Data With Hadoop
With Saurabh Singh singh.903@osu.edu The Ohio State University February 11, 2016 Overview 1 2 3 Requirements Ecosystem Resilient Distributed Datasets (RDDs) Example Code vs Mapreduce 4 5 Source: [Tutorials
More informationIntroduction to Cloud Computing
Introduction to Cloud Computing Cloud Computing I (intro) 15 319, spring 2010 2 nd Lecture, Jan 14 th Majd F. Sakr Lecture Motivation General overview on cloud computing What is cloud computing Services
More informationIntroduction to Apache YARN Schedulers & Queues
Introduction to Apache YARN Schedulers & Queues In a nutshell, YARN was designed to address the many limitations (performance/scalability) embedded into Hadoop version 1 (MapReduce & HDFS). Some of the
More informationHadoop. MPDL-Frühstück 9. Dezember 2013 MPDL INTERN
Hadoop MPDL-Frühstück 9. Dezember 2013 MPDL INTERN Understanding Hadoop Understanding Hadoop What's Hadoop about? Apache Hadoop project (started 2008) downloadable open-source software library (current
More informationCSE-E5430 Scalable Cloud Computing Lecture 2
CSE-E5430 Scalable Cloud Computing Lecture 2 Keijo Heljanko Department of Computer Science School of Science Aalto University keijo.heljanko@aalto.fi 14.9-2015 1/36 Google MapReduce A scalable batch processing
More informationA Locality Enhanced Scheduling Method for Multiple MapReduce Jobs In a Workflow Application
2012 International Conference on Information and Computer Applications (ICICA 2012) IPCSIT vol. 24 (2012) (2012) IACSIT Press, Singapore A Locality Enhanced Scheduling Method for Multiple MapReduce Jobs
More informationComparision of k-means and k-medoids Clustering Algorithms for Big Data Using MapReduce Techniques
Comparision of k-means and k-medoids Clustering Algorithms for Big Data Using MapReduce Techniques Subhashree K 1, Prakash P S 2 1 Student, Kongu Engineering College, Perundurai, Erode 2 Assistant Professor,
More informationScalable Cloud Computing Solutions for Next Generation Sequencing Data
Scalable Cloud Computing Solutions for Next Generation Sequencing Data Matti Niemenmaa 1, Aleksi Kallio 2, André Schumacher 1, Petri Klemelä 2, Eija Korpelainen 2, and Keijo Heljanko 1 1 Department of
More informationVolume 3, Issue 6, June 2015 International Journal of Advance Research in Computer Science and Management Studies
Volume 3, Issue 6, June 2015 International Journal of Advance Research in Computer Science and Management Studies Research Article / Survey Paper / Case Study Available online at: www.ijarcsms.com Image
More informationCURTAIL THE EXPENDITURE OF BIG DATA PROCESSING USING MIXED INTEGER NON-LINEAR PROGRAMMING
Journal homepage: http://www.journalijar.com INTERNATIONAL JOURNAL OF ADVANCED RESEARCH RESEARCH ARTICLE CURTAIL THE EXPENDITURE OF BIG DATA PROCESSING USING MIXED INTEGER NON-LINEAR PROGRAMMING R.Kohila
More informationWelcome to the unit of Hadoop Fundamentals on Hadoop architecture. I will begin with a terminology review and then cover the major components
Welcome to the unit of Hadoop Fundamentals on Hadoop architecture. I will begin with a terminology review and then cover the major components of Hadoop. We will see what types of nodes can exist in a Hadoop
More informationLocality and Network-Aware Reduce Task Scheduling for Data-Intensive Applications
Locality and Network-Aware Reduce Task Scheduling for Data-Intensive Applications Engin Arslan University at Buffalo (SUNY) enginars@buffalo.edu Mrigank Shekhar Tevfik Kosar Intel Corporation University
More informationSCHEDULING IN CLOUD COMPUTING
SCHEDULING IN CLOUD COMPUTING Lipsa Tripathy, Rasmi Ranjan Patra CSA,CPGS,OUAT,Bhubaneswar,Odisha Abstract Cloud computing is an emerging technology. It process huge amount of data so scheduling mechanism
More informationHadoop: A Framework for Data- Intensive Distributed Computing. CS561-Spring 2012 WPI, Mohamed Y. Eltabakh
1 Hadoop: A Framework for Data- Intensive Distributed Computing CS561-Spring 2012 WPI, Mohamed Y. Eltabakh 2 What is Hadoop? Hadoop is a software framework for distributed processing of large datasets
More informationTask Scheduling Algorithm for Map Reduce To Control Load Balancing In Big Data
Task Scheduling Algorithm for Map Reduce To Control Load Balancing In Big Data Ms.N.Saranya, M.E., (CSE), Jay Shriram Group of Institutions, Tirupur. charanyaa19@gmail.com Abstract- Load balancing is biggest
More informationmarlabs driving digital agility WHITEPAPER Big Data and Hadoop
marlabs driving digital agility WHITEPAPER Big Data and Hadoop Abstract This paper explains the significance of Hadoop, an emerging yet rapidly growing technology. The prime goal of this paper is to unveil
More informationApache Hama Design Document v0.6
Apache Hama Design Document v0.6 Introduction Hama Architecture BSPMaster GroomServer Zookeeper BSP Task Execution Job Submission Job and Task Scheduling Task Execution Lifecycle Synchronization Fault
More informationBig Data and Hadoop. Sreedhar C, Dr. D. Kavitha, K. Asha Rani
Big Data and Hadoop Sreedhar C, Dr. D. Kavitha, K. Asha Rani Abstract Big data has become a buzzword in the recent years. Big data is used to describe a massive volume of both structured and unstructured
More informationAn efficient Mapreduce scheduling algorithm in hadoop R.Thangaselvi 1, S.Ananthbabu 2, R.Aruna 3
An efficient Mapreduce scheduling algorithm in hadoop R.Thangaselvi 1, S.Ananthbabu 2, R.Aruna 3 1 M.E: Department of Computer Science, VV College of Engineering, Tirunelveli, India 2 Assistant Professor,
More informationA SURVEY ON MAPREDUCE IN CLOUD COMPUTING
A SURVEY ON MAPREDUCE IN CLOUD COMPUTING Dr.M.Newlin Rajkumar 1, S.Balachandar 2, Dr.V.Venkatesakumar 3, T.Mahadevan 4 1 Asst. Prof, Dept. of CSE,Anna University Regional Centre, Coimbatore, newlin_rajkumar@yahoo.co.in
More informationBig Application Execution on Cloud using Hadoop Distributed File System
Big Application Execution on Cloud using Hadoop Distributed File System Ashkan Vates*, Upendra, Muwafaq Rahi Ali RPIIT Campus, Bastara Karnal, Haryana, India ---------------------------------------------------------------------***---------------------------------------------------------------------
More informationLarge scale processing using Hadoop. Ján Vaňo
Large scale processing using Hadoop Ján Vaňo What is Hadoop? Software platform that lets one easily write and run applications that process vast amounts of data Includes: MapReduce offline computing engine
More informationLoad Rebalancing for File System in Public Cloud Roopa R.L 1, Jyothi Patil 2
Load Rebalancing for File System in Public Cloud Roopa R.L 1, Jyothi Patil 2 1 PDA College of Engineering, Gulbarga, Karnataka, India rlrooparl@gmail.com 2 PDA College of Engineering, Gulbarga, Karnataka,
More informationNoSQL and Hadoop Technologies On Oracle Cloud
NoSQL and Hadoop Technologies On Oracle Cloud Vatika Sharma 1, Meenu Dave 2 1 M.Tech. Scholar, Department of CSE, Jagan Nath University, Jaipur, India 2 Assistant Professor, Department of CSE, Jagan Nath
More informationThe Hadoop Framework
The Hadoop Framework Nils Braden University of Applied Sciences Gießen-Friedberg Wiesenstraße 14 35390 Gießen nils.braden@mni.fh-giessen.de Abstract. The Hadoop Framework offers an approach to large-scale
More informationIntroduction to Hadoop. New York Oracle User Group Vikas Sawhney
Introduction to Hadoop New York Oracle User Group Vikas Sawhney GENERAL AGENDA Driving Factors behind BIG-DATA NOSQL Database 2014 Database Landscape Hadoop Architecture Map/Reduce Hadoop Eco-system Hadoop
More informationGuidelines for Selecting Hadoop Schedulers based on System Heterogeneity
Noname manuscript No. (will be inserted by the editor) Guidelines for Selecting Hadoop Schedulers based on System Heterogeneity Aysan Rasooli Douglas G. Down Received: date / Accepted: date Abstract Hadoop
More informationHadoop implementation of MapReduce computational model. Ján Vaňo
Hadoop implementation of MapReduce computational model Ján Vaňo What is MapReduce? A computational model published in a paper by Google in 2004 Based on distributed computation Complements Google s distributed
More informationLog Mining Based on Hadoop s Map and Reduce Technique
Log Mining Based on Hadoop s Map and Reduce Technique ABSTRACT: Anuja Pandit Department of Computer Science, anujapandit25@gmail.com Amruta Deshpande Department of Computer Science, amrutadeshpande1991@gmail.com
More informationMatchmaking: A New MapReduce Scheduling Technique
Matchmaking: A New MapReduce Scheduling Technique Chen He Ying Lu David Swanson Department of Computer Science and Engineering University of Nebraska-Lincoln Lincoln, U.S. {che,ylu,dswanson}@cse.unl.edu
More informationISSN: 2321-7782 (Online) Volume 3, Issue 6, June 2015 International Journal of Advance Research in Computer Science and Management Studies
ISSN: 2321-7782 (Online) Volume 3, Issue 6, June 2015 International Journal of Advance Research in Computer Science and Management Studies Research Article / Survey Paper / Case Study Available online
More informationMobile Storage and Search Engine of Information Oriented to Food Cloud
Advance Journal of Food Science and Technology 5(10): 1331-1336, 2013 ISSN: 2042-4868; e-issn: 2042-4876 Maxwell Scientific Organization, 2013 Submitted: May 29, 2013 Accepted: July 04, 2013 Published:
More informationPIKACHU: How to Rebalance Load in Optimizing MapReduce On Heterogeneous Clusters
PIKACHU: How to Rebalance Load in Optimizing MapReduce On Heterogeneous Clusters Rohan Gandhi, Di Xie, Y. Charlie Hu Purdue University Abstract For power, cost, and pricing reasons, datacenters are evolving
More informationMapReduce (in the cloud)
MapReduce (in the cloud) How to painlessly process terabytes of data by Irina Gordei MapReduce Presentation Outline What is MapReduce? Example How it works MapReduce in the cloud Conclusion Demo Motivation:
More informationNon-intrusive Slot Layering in Hadoop
213 13th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing Non-intrusive Layering in Hadoop Peng Lu, Young Choon Lee, Albert Y. Zomaya Center for Distributed and High Performance Computing,
More informationAnalysis and Modeling of MapReduce s Performance on Hadoop YARN
Analysis and Modeling of MapReduce s Performance on Hadoop YARN Qiuyi Tang Dept. of Mathematics and Computer Science Denison University tang_j3@denison.edu Dr. Thomas C. Bressoud Dept. of Mathematics and
More informationOptimization and analysis of large scale data sorting algorithm based on Hadoop
Optimization and analysis of large scale sorting algorithm based on Hadoop Zhuo Wang, Longlong Tian, Dianjie Guo, Xiaoming Jiang Institute of Information Engineering, Chinese Academy of Sciences {wangzhuo,
More informationOutline. High Performance Computing (HPC) Big Data meets HPC. Case Studies: Some facts about Big Data Technologies HPC and Big Data converging
Outline High Performance Computing (HPC) Towards exascale computing: a brief history Challenges in the exascale era Big Data meets HPC Some facts about Big Data Technologies HPC and Big Data converging
More informationCDBMS Physical Layer issue: Load Balancing
CDBMS Physical Layer issue: Load Balancing Shweta Mongia CSE, School of Engineering G D Goenka University, Sohna Shweta.mongia@gdgoenka.ac.in Shipra Kataria CSE, School of Engineering G D Goenka University,
More informationBSPCloud: A Hybrid Programming Library for Cloud Computing *
BSPCloud: A Hybrid Programming Library for Cloud Computing * Xiaodong Liu, Weiqin Tong and Yan Hou Department of Computer Engineering and Science Shanghai University, Shanghai, China liuxiaodongxht@qq.com,
More informationCSE 590: Special Topics Course ( Supercomputing ) Lecture 10 ( MapReduce& Hadoop)
CSE 590: Special Topics Course ( Supercomputing ) Lecture 10 ( MapReduce& Hadoop) Rezaul A. Chowdhury Department of Computer Science SUNY Stony Brook Spring 2016 MapReduce MapReduce is a programming model
More informationBig Data Analysis and Its Scheduling Policy Hadoop
IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661,p-ISSN: 2278-8727, Volume 17, Issue 1, Ver. IV (Jan Feb. 2015), PP 36-40 www.iosrjournals.org Big Data Analysis and Its Scheduling Policy
More informationTelecom Data processing and analysis based on Hadoop
COMPUTER MODELLING & NEW TECHNOLOGIES 214 18(12B) 658-664 Abstract Telecom Data processing and analysis based on Hadoop Guofan Lu, Qingnian Zhang *, Zhao Chen Wuhan University of Technology, Wuhan 4363,China
More informationMapReduce and Hadoop Distributed File System
MapReduce and Hadoop Distributed File System 1 B. RAMAMURTHY Contact: Dr. Bina Ramamurthy CSE Department University at Buffalo (SUNY) bina@buffalo.edu http://www.cse.buffalo.edu/faculty/bina Partially
More informationVirtual Machine Based Resource Allocation For Cloud Computing Environment
Virtual Machine Based Resource Allocation For Cloud Computing Environment D.Udaya Sree M.Tech (CSE) Department Of CSE SVCET,Chittoor. Andra Pradesh, India Dr.J.Janet Head of Department Department of CSE
More informationA Survey of Cloud Computing Guanfeng Octides
A Survey of Cloud Computing Guanfeng Nov 7, 2010 Abstract The principal service provided by cloud computing is that underlying infrastructure, which often consists of compute resources like storage, processors,
More informationAn Approach to Load Balancing In Cloud Computing
An Approach to Load Balancing In Cloud Computing Radha Ramani Malladi Visiting Faculty, Martins Academy, Bangalore, India ABSTRACT: Cloud computing is a structured model that defines computing services,
More informationEfficient Data Replication Scheme based on Hadoop Distributed File System
, pp. 177-186 http://dx.doi.org/10.14257/ijseia.2015.9.12.16 Efficient Data Replication Scheme based on Hadoop Distributed File System Jungha Lee 1, Jaehwa Chung 2 and Daewon Lee 3* 1 Division of Supercomputing,
More informationExploring the Efficiency of Big Data Processing with Hadoop MapReduce
Exploring the Efficiency of Big Data Processing with Hadoop MapReduce Brian Ye, Anders Ye School of Computer Science and Communication (CSC), Royal Institute of Technology KTH, Stockholm, Sweden Abstract.
More informationPerformance and Energy Efficiency of. Hadoop deployment models
Performance and Energy Efficiency of Hadoop deployment models Contents Review: What is MapReduce Review: What is Hadoop Hadoop Deployment Models Metrics Experiment Results Summary MapReduce Introduced
More informationhttp://www.paper.edu.cn
5 10 15 20 25 30 35 A platform for massive railway information data storage # SHAN Xu 1, WANG Genying 1, LIU Lin 2** (1. Key Laboratory of Communication and Information Systems, Beijing Municipal Commission
More informationLifetime Management of Cache Memory using Hadoop Snehal Deshmukh 1 Computer, PGMCOE, Wagholi, Pune, India
Volume 3, Issue 1, January 2015 International Journal of Advance Research in Computer Science and Management Studies Research Article / Survey Paper / Case Study Available online at: www.ijarcsms.com ISSN:
More informationEnergy Constrained Resource Scheduling for Cloud Environment
Energy Constrained Resource Scheduling for Cloud Environment 1 R.Selvi, 2 S.Russia, 3 V.K.Anitha 1 2 nd Year M.E.(Software Engineering), 2 Assistant Professor Department of IT KSR Institute for Engineering
More informationThe Comprehensive Performance Rating for Hadoop Clusters on Cloud Computing Platform
The Comprehensive Performance Rating for Hadoop Clusters on Cloud Computing Platform Fong-Hao Liu, Ya-Ruei Liou, Hsiang-Fu Lo, Ko-Chin Chang, and Wei-Tsong Lee Abstract Virtualization platform solutions
More informationEnhancing MapReduce Functionality for Optimizing Workloads on Data Centers
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 2, Issue. 10, October 2013,
More information