Do You Feel the Lag of Your Hadoop?
|
|
- Ursula Robertson
- 8 years ago
- Views:
Transcription
1 Do You Feel the Lag of Your Hadoop? Yuxuan Jiang, Zhe Huang, and Danny H.K. Tsang Department of Electronic and Computer Engineering The Hong Kong University of Science and Technology, Hong Kong {yjiangad, ecefelix, Abstract The configuration of a Hadoop cluster is significantly important to its performance, because an improper configuration can greatly deteriorate the job execution performance. Unfortunately, systematic guidelines on how to configure a Hadoop cluster are still missing. In this paper, we undertake an empirical study on key operations and mechanisms of Hadoop job execution, including the task assignment strategy and speculative execution. Based on the experiments, we provide suggestions on the system configuration, particularly on the matching between the hardware resource partitioning scheme and the job splitting granularity. I. INTRODUCTION Recent years have witnessed the rapidly increasing demand for large-scale data processing, such as webpage indexing, data mining, scientific simulations, spam detection and the like. For example, Facebook processed more than 00 TB of new data every day according to a study conducted in the year 0 []. MapReduce [] has emerged as a promising parallel processing framework for big data analytics. Apache Hadoop [] is the de facto open-source standard implementation of the MapReduce framework. It has been adopted by numerous users throughout the world including Twitter, EBay, Yahoo, Facebook, and Hulu []. The popularity of Hadoop can be demonstrated by a recent report that shows the production Hadoop cluster operated by Yahoo had successfully processed over thousand jobs from various users over a period of ten months []. Hadoop carries out enormous data analyzing jobs using computing clusters in a scale-out manner. The Hadoop framework parallelizes the data analysis by separating the processes into two parts, Map tasks that perform filtering and sorting, and Reduce tasks that perform a summary operation. The performance of a Hadoop cluster depends on how Map and Reduce tasks are scheduled to be processed by separate nodes in the cluster. On the one hand, job schedulers [], [], [] are proposed to address different performance issues in the resource allocation processes for jobs. On the other hand, enhancing data locality by task-level scheduling inside jobs is also important [] []. However, the resource allocation in Hadoop is complicated and depends on many system parameters and design decisions. Without carefully fine tuning the system, the performance of the system can be far from optimal. Unfortunately, resource allocation mechanisms in Hadoop are not documented in detail. For most users, these mechanisms operate as black-boxes. This motivates us to study such mechanisms empirically through extensive experiments. The goal of this paper is to shed some light on how to properly configure the Hadoop system so as to improve the job execution performance. To understand how the Hadoop system interacts with system configurations, detailed behaviors of Hadoop are investigated. In this paper, we are particularly interested in the following issues: () Map and Reduce task assignment preference; () the Hadoop speculative execution mechanism for tasks; () hardware granularity of Hadoop task slots; and () the granularity of job splitting. More specifically, the task assignment preference determines which Map or Reduce task is assigned to which node to be executed. The speculative execution mechanism determines when and which task should be duplicated as a backup proactively for fault-tolerance. The hardware granularity of Hadoop task slots decides how resources are partitioned and shared among multiple tasks, and the granularity of job splitting determines the size of each task for a job. The above aspects are closely tangled together. Improper configuration can easily create a severe performance bottleneck. Official documents discuss the above issues at a very high level []. Currently, users mainly rely on their own experience to come up with the configuration parameters. From our experiments in this paper, the key observations and conclusions include: Hadoop task assignment only depends on data locality. Performance bottlenecks emerge when data locality is taken out of consideration, due to the lack of workload balancing. The Hadoop speculative execution mechanism is heuristic and simple. Imperative backup task execution may be delayed or prevented, while unnecessary backups are likely to be created. Matching the hardware resource partitioning granularity with the job splitting granularity significantly improves the job execution performance. However, this matching requires the user to have detailed knowledge of the jobs and the hardware configuration of the cluster. The points above provide a general guideline on configuring the system parameters. More importantly, they open up new directions to improve the current Hadoop implementation. The rest of this paper is organized as follows. Key factors on Hadoop job execution are presented in Section II. Our experimental settings are described in Section III. Experimental results for task assignment, speculative execution and granularity matching are reported and analyzed in Section IV, V and VI, respectively. Finally, Section VII concludes the paper. II. BACKGROUND OF HADOOP JOB EXECUTION A. Hadoop Resource Provisioning and Job Splitting Computing task parallelization lies in the core concept of the MapReduce framework. A particular Hadoop job is parallelized
2 into multiple Map and Reduce tasks. Each Map task is responsible for processing a portion of the data stored in the Hadoop Distributed File System (HDFS). Reduce tasks then sort and combine the Map results. There are many factors that influence the execution performance of a Hadoop job. One such factor is the job splitting scheme. Data associated with the jobs are split into blocks and stored in the HDFS. The number of Map tasks of one job should be no smaller than the amount of data blocks associated with this job. By default, they are exactly equal to each other, and only one task is generated for Reduce processing. The total number of tasks of a job should be carefully determined. If a job is split into a large number of tasks, each task will require less time and fewer resources for processing. In this case, task scheduling is more flexible. But an excessive number of tasks will introduce unnecessary queuing delay and extra task initialization overheads. However, if the number of tasks is too small, scheduling becomes cumbersome. Once all the tasks have already been scheduled, the Hadoop cluster is not able to start new tasks, even if it has extra capacity. Another key factor that influences the job execution performance is the hardware resource partitioning granularity. In classical Hadoop (i.e., before Hadoop.0), the computing resources of a Hadoop cluster are divided into capacity units called task slots. One task slot is able to process only one Hadoop task. Under the framework after Hadoop.0, Apache YARN [] is introduced to manage hardware resources, in which the basic capacity units are called containers. To simplify our discussion, in this paper, we refer to the capacity required to execute one job task as a task slot. There exists a trade-off between the number of simultaneous task executions and the program parallelization efficiency. When the number of tasks that can be executed in parallel is too small, jobs experience unnecessary queueing delay, because unscheduled tasks are waiting for vacant task slots. However, if the cluster configuration allows an excessive number of simultaneous task executions, multi-thread program scheduling overheads and the bottleneck of resource sharing among slots emerge and deteriorate the performance. B. Speculative Execution Speculative execution is introduced as a fault-tolerance mechanism in Hadoop. This mechanism constantly monitors the progressions of tasks. If any task falls behind in progress and is in danger of failure, the speculative execution mechanism proactively creates an identical backup task using an available slot. Because backup tasks consume extra resources, unnecessary restart of tasks can potentially slow down the execution of other active jobs. As a result, the speculative execution mechanism should be carefully designed. III. EXPERIMENTAL SETUP In the following experiments, Hadoop.. is deployed onto a computing cluster. In this paper, we focus on the Hadoop operations and mechanisms for one single job, so a relatively small Hadoop cluster with one master node and six slave nodes is sufficient. The master node is hosted by an HP Compaq DX00 desktop, and six slave nodes are hosted by homogeneous virtual machines (VMs) in our private computing cloud that consists of Dell PowerEdge T0 servers. Each of the VMs is allocated with four virtual processing cores from Intel Xeon E0 CPUs and GB of memory. The HDFS is built upon local storage on each server instead of the network attached storage system. In the experiments, the word-count job is adopted as the benchmark application. According to a trace study of a Yahoo production Hadoop cluster [], % of submitted jobs are map-only or map-mostly. The word-count job falls into the map-mostly category. This program is widely adopted in Hadoop performance analysis experiments by MIT [], Intel [] and the like. Since data locality in the Hadoop system is a well-studied subject [] [], we investigate other possible factors that influence the performance by eliminating the data locality issue. In all of the following experiments, the HDFS data replication parameter is set to so that each slave node will maintain a complete copy of all the data. Hadoop capacity scheduler [] is used to submit jobs. During the experiment, only one job is submitted each time, so the single job microscopic performance can be studied in detail. IV. HADOOP TASK ASSIGNMENT PREFERENCE The task assignment preference of Hadoop determines how the workload is balanced among slave nodes. In this part of the experiments, how Hadoop assigns job tasks to active slave nodes is investigated. In total, 0. GB of Wikipedia English website dump data [] are split into data blocks, with a block size of MB in the HDFS. The data are assigned to create Map tasks. This number is selected to create uneven workloads among slave nodes. One Reduce task is used to summarize the Map results. To study the task assignment behavior, we take Map task assignment as an example. The experiment is repeated four times, and in each run the hardware resources in the Hadoop cluster are partitioned into a different number of task slots. Figure (a) indicates the execution time of each Map task when the Hadoop cluster is configured so that each slave node contains Map slots. Therefore, the cluster contains Map slots in total. Job tasks assigned to the same slave node are presented using the same color in the figure. The results indicate that slaves, and are left totally idle, slave hosts only one task and slaves and are fully loaded to host the remaining tasks. Figures (b), (c) and (d) show similar results when the Hadoop cluster is configured so that each slave node contains, and Map slots, respectively. In the above experiments, the results clearly indicate that job tasks are assigned to slave nodes one-by-one. The task scheduler only assigns tasks to another slave node when the current slave node s task slots are fully occupied. As a result, there is only one under-utilized slave node that hosts the remaining number of job tasks. More importantly, the tasks hosted by this under-utilized slave node complete much earlier than other tasks running on fully utilized slave nodes (e.g., task in Figure (a); tasks, and in Figure (b); tasks,,, and in Figure (c); and task in Figure (d)). Our observation contradicts the belief that Hadoop slots are homogeneous computational resources. The
3 Tasks on Slave Tasks on Slave Tasks on Slave Tasks on Slave Tasks on Slave Tasks on Slave Tasks on Slave Tasks on Slave Tasks on Slave Tasks on Slave Tasks on Slave Tasks on Slave Tasks on Slave 0 (a) Total Number of Map Slots = 0 0 (b) Total Number of Map Slots = (c) Total Number of Map Slots = Fig.. Execution Time Distribution for Map Tasks 0 (d) Total Number of Map Slots = performance of a Hadoop task slot on a slave node depends on the total workload on that node. This can be explained by the resource bottleneck due to the sharing among all Hadoop task slots on one single slave node. For example, the storage bottleneck of a slave node prevents a large number of Map tasks from reading data at the same time. It is learnt that Hadoop task assignment mainly depends on data locality. The lack of workload balancing can result in serious degradation of performance. Load balancing is, therefore, suggested to be implemented into Hadoop task assignment. For example, a simple heuristic strategy would be distributing tasks evenly among all active slave nodes with regard to data locality. V. HADOOP SPECULATIVE EXECUTION The speculative execution mechanism launches backups for active tasks with slow progressions. However, there is no clear description of the triggering criterion of backup tasks in the official documents of Hadoop []. In this section, we study the triggering conditions of backup tasks in speculative execution. To simplify the experiment, a KB ebook downloaded from the Gutenberg project [] is submitted to the cluster, and Map tasks are created in total. Artificial delay is programmed in the Map function of the Hadoop word-count job. As a result, the progress rate of the Map task is evenly injected by artificial delay in granularity of around 0.%. The cluster is configured to have Map slots. Without introducing delay, our measurements report that a Map task roughly takes seconds to complete. The experimental settings above allow us to freely control the duration of each individual Map task. Figure (a) shows the execution progressions of Map tasks in the time domain when the same amount of delay is introduced for all the tasks. In this case, all the tasks have similar progress rates and hence no backup task is triggered. Figure (b) shows that two backup tasks are generated when extra delay is introduced for tasks and, leading to slow progressions for them. In Figure (c), the backup task is triggered only for task if the extra delay introduced for task is lower than that for task, but higher than that for other tasks. The above results imply that one of the necessary conditions for the speculative execution mechanism to launch a backup task is that there exists a slow-running task whose progress rate is lower than a certain threshold compared to other healthy tasks. However, the progress rate threshold is not the only triggering criterion of backup tasks in the speculative execution mechanism. In the next part of our experiment, artificial delay is introduced only for task. By continuously increasing the injected delay from zero to a large number, the time condition that will trigger backup tasks can be located. Figure (d) shows that no backup task is created if task is finished before t = s. Figure (e) shows that one backup task is launched for task at the time t = s if its execution time is seconds. Note that task has a much slower progress rate than its peer tasks in both Figures (d) and (e). If the progress rate threshold is the only condition, backup tasks should be triggered for both cases. More interestingly, we observe that the backup task is always launched at around t = 0s when the execution time for task is longer than or equal to seconds. This observation suggests that speculative execution takes effect to create backup tasks by monitoring existing tasks after around t = 0s. In Figure (f), each slave node is configured to have only Map task slot. In this way, Map tasks cannot be scheduled within just one wave in the Map phase. The amount of artificial delay is introduced for all the Map tasks in the same way as that in Figure (a). In this case, backup tasks are launched for tasks, and right after the original tasks start. Combined with the observations from Figures (d) and (e), it is concluded that backup tasks can only be launched after the job has been running for some time. This period is an absolute value of around one minute in our experiment. Additionally, Hadoop will check the absolute progress rate of the targeted normal task before launching its backup. If the targeted task is approaching completion, the backup will not be launched. Figure (f) also illustrates another feature of speculative execution. In this figure, the execution times for all Map tasks are close to each other and are,, 0,,,,, and seconds, respectively. For tasks, and, scheduled in the second wave of the Map phase, their backup tasks are launched almost immediately after the corresponding normal tasks are created. This means these tasks are considered to be at the risk of failure from the very beginning of their executions. This phenomenon implies that the absolute progress rate of one task at each time point, not the relative progress rate with respect to the time at which the task is scheduled, is compared with its peer tasks. This strategy obviously increases the probability for backup tasks to be launched. However, the design here is
4 (a) Uniformly Delay 0 Backup for Task Backup for Task (b) Extra Delay for Tasks and 0 Backup for Task (c) Higher Delay for Task (d) Delay Introduced only for Task 0 Fig.. Backup for Task (e) More Delay Introduced for Task Transient Progressions of Map Tasks 0 Backup for Task Backup for Task Backup for Task (f) Uniform Delay but Fewer Map Slots reasonable in terms of fully utilizing the computing resources in the cluster. In Hadoop, unscheduled normal tasks have a higher priority to use free task slots than backup tasks. If free slots still exist after all normal tasks have been scheduled, it is better to utilize them for backup tasks to prevent failure proactively, rather than just leave them idle. Nevertheless, and more importantly, a potential design weakness of slow-running task detection is exposed here. Genuine slow-running tasks in terms of their relative progress rates may not have free task slots to launch backups if these slots are all occupied by unnecessary backups whose corresponding normal tasks are judged to be slow in progress by absolute progress rates, but healthy by relative progress rates. We summarize our observations into the following conditions for speculative execution to launch a backup task: All normal tasks have already been scheduled, and there exists one free task slot in the cluster. There exists a slow-running task whose progress rate is behind some progress rate threshold. The job should have lasted longer than or equal to around one minute. The slow-running task is not approaching completion at the time when its backup task should be triggered. Our observations above agree with the descriptions by Dinu et al. on the speculative execution mechanism []. In summary, the current implementation of speculative execution is simple and heuristic. Unnecessary backups are likely to be generated, while genuine slow-running tasks in terms of relative progress rates may not be granted resources for backups. Also the creation of backup tasks might be delayed due to the condition that backup tasks can only be launched after the job has lasted for one minute. To improve the speculative execution mechanism, it is suggested that more advanced algorithms (e.g. machine learning techniques) should be applied to the detection of slow-running tasks. VI. RESOURCE PARTITION AND JOB SPLITTING SCHEMES Hardware resource partitioning and job splitting granularity determine how efficiently the workload is parallelized and distributed inside the cluster. Given fixed hardware resources, we aim to find the optimal number of task slots that each slave node should have. An optimal job splitting strategy that best matches the hardware resource partitioning scheme is also investigated. In the following experiments, the same 0. GB of English Wikipedia web dump data is used, and the number of Map task slots on each slave node is varied from to. We mainly focus on the performance of the Map phase because it accounts for more than % of the total job execution time in our experiment. To quantitatively measure the effects of job splitting, the job execution times when the job is split into different numbers of Map tasks are measured, under the cases with the total number of Map task slots from to 0. Job execution time measurement results are reported in Table I. It can be observed by the columns that for one fixed job splitting scheme, with the increasing of the total number of Map slots, the job execution time follows the generic trend that it first goes down, then reaches optimal, which is the shortest time, and finally goes up again. Note that insufficient partitioning of task slots on one node leads to severe queueing delay for tasks because parallelization is not enough. On the other hand, an excessive number of task slots on one node incurs extra multi-thread program scheduling overheads. Also in this situation, a heavy workload on one node can cause a resource sharing bottleneck among task slots, which has been indicated in Section IV. In this way, task execution can further be retarded. We speculate from Table I that, in general, the
5 TABLE I MEASUREMENT RESULTS OF WORD-COUNT JOB EXECUTION TIMES (SEC) # of Job # of Total Splits Map Task Slots optimal number of total task slots in one cluster is slightly bigger than the total number of processing cores in that cluster, which is roughly within the range of 0 to in our experiment. Given the total number of Map slots, job execution time varies according to the job splitting scheme, which can be observed by the rows in Table I. Thus, an appropriate matching between the total number of task slots and the total amount of job splits is required. It can be observed from Table I that splitting a job into Map tasks of exactly the same amount as the total number of Map slots in the cluster achieves the best performance (e.g., execution times highlighted in red in the table). According to the observations in Section IV, this job splitting strategy ensures that every slot is assigned a normal task, and the overall Map execution takes just one wave. If the total number of Map tasks is fewer than the total number of Map slots, job execution performance degrades due to the resource underutilization that some slots or even nodes are left idle or used for backups. Likewise, it can be observed that splitting a job into a number of tasks which is the integral multiple of the total number of Map slots achieves sub-optimal performance (e.g., execution times highlighted in blue in the table) because, on the whole, each slot is assigned a normal task in each wave. However, splitting a job into too many Map tasks deteriorates the execution performance and should be avoided. In this case, the whole data processing is divided into tiny units and initialization overheads dominate the task execution. In practice, tasks with longer execution times have higher probabilities of failure. Therefore, a job containing a huge volume of data is better to be split into a larger number of tasks to reduce the cost of failure recovery. As a result, the number of tasks of this kind of jobs is encouraged to be the integral multiple of the total number of slots in the cluster. VII. CONCLUSION How to configure the Hadoop system according to its operations and mechanisms in job execution is of great significance to the execution performance. In this paper, we perform extensive experiments to get insights from a practical perspective. Based on experimental observations, we provide suggestions on system configuration, particularly on granularity determination. REFERENCES [] A. Menon, Big data@ Facebook, in Proc. ACM Workshop on Management of Big Data Systems, 0, pp.. [] J. Dean and S. Ghemawat, MapReduce: simplified data processing on large clusters, Communications of the ACM, vol., no., pp. 0, 00. [] T. White, Hadoop: The definitive guide. O Reilly Media, Inc., 0. [] Hadoop Wiki: Powered By. [Online]. Available: hadoop/poweredby [] S. Kavulya, J. Tan, R. Gandhi, and P. Narasimhan, An analysis of traces from a production MapReduce cluster, in Proc. IEEE/ACM Int. Conf. Cluster, Cloud and Grid Computing (CCGrid), 00, pp. 0. [] K. Kc and K. Anyanwu, Scheduling Hadoop jobs to meet deadlines, in Proc. IEEE Int. Conf. Cloud Computing Technology and Science (CloudCom), 00, pp.. [] T. Sandholm and K. Lai, Dynamic proportional share scheduling in Hadoop, in Job Scheduling Strategies for Parallel Processing. Springer, 00, pp. 0. [] Z. Guo, G. Fox, and M. Zhou, Investigation of data locality in MapReduce, in Proc. IEEE/ACM Int. Symp. Cluster, Cloud and Grid Computing (CCGrid), 0, pp.. [] X. Zhang, Z. Zhong, S. Feng, B. Tu, and J. Fan, Improving data locality of MapReduce by scheduling in homogeneous computing environments, in Proc. IEEE Int. Symp. Parallel and Distributed Processing with Applications (ISPA), 0, pp. 0. [0] J. Jin, J. Luo, A. Song, F. Dong, and R. Xiong, Bar: an efficient data locality driven task scheduling algorithm for cloud computing, in Proc. IEEE/ACM Int. Symp. Cluster, Cloud and Grid Computing (CCGrid), 0, pp. 0. [] W. Wang, K. Zhu, L. Ying, J. Tan, and L. Zhang, A throughput optimal algorithm for map task scheduling in MapReduce with data locality, ACM SIGMETRICS Performance Evaluation Review, vol. 0, no., pp., 0. [] Hadoop YARN. [Online]. Available: current/hadoop-yarn/hadoop-yarn-site/yarn.html [] Y. Mao, R. Morris, and M. F. Kaashoek, Optimizing MapReduce for multicore architectures, in Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Tech. Rep., 00. [] Optimizing Hadoop Deployments. [Online]. Available: cloud-computing-optimizing-hadoop-deployments-paper.pdf [] Wiki Dump. [Online]. Available: [] Free ebooks: Project Gutenberg. [Online]. Available: gutenberg.org/ [] F. Dinu and T. Ng, Understanding the effects and implications of compute node related failures in Hadoop, in Proc. ACM Int. Symp. High-Performance Parallel and Distributed Computing (HPDC), 0, pp..
A Hybrid Scheduling Approach for Scalable Heterogeneous Hadoop Systems
A Hybrid Scheduling Approach for Scalable Heterogeneous Hadoop Systems Aysan Rasooli Department of Computing and Software McMaster University Hamilton, Canada Email: rasooa@mcmaster.ca Douglas G. Down
More informationSurvey on Scheduling Algorithm in MapReduce Framework
Survey on Scheduling Algorithm in MapReduce Framework Pravin P. Nimbalkar 1, Devendra P.Gadekar 2 1,2 Department of Computer Engineering, JSPM s Imperial College of Engineering and Research, Pune, India
More informationA REAL TIME MEMORY SLOT UTILIZATION DESIGN FOR MAPREDUCE MEMORY CLUSTERS
A REAL TIME MEMORY SLOT UTILIZATION DESIGN FOR MAPREDUCE MEMORY CLUSTERS Suma R 1, Vinay T R 2, Byre Gowda B K 3 1 Post graduate Student, CSE, SVCE, Bangalore 2 Assistant Professor, CSE, SVCE, Bangalore
More informationEfficient Data Replication Scheme based on Hadoop Distributed File System
, pp. 177-186 http://dx.doi.org/10.14257/ijseia.2015.9.12.16 Efficient Data Replication Scheme based on Hadoop Distributed File System Jungha Lee 1, Jaehwa Chung 2 and Daewon Lee 3* 1 Division of Supercomputing,
More informationMAPREDUCE [1] is proposed by Google in 2004 and
IEEE TRANSACTIONS ON COMPUTERS 1 Improving MapReduce Performance Using Smart Speculative Execution Strategy Qi Chen, Cheng Liu, and Zhen Xiao, Senior Member, IEEE Abstract MapReduce is a widely used parallel
More informationThe Comprehensive Performance Rating for Hadoop Clusters on Cloud Computing Platform
The Comprehensive Performance Rating for Hadoop Clusters on Cloud Computing Platform Fong-Hao Liu, Ya-Ruei Liou, Hsiang-Fu Lo, Ko-Chin Chang, and Wei-Tsong Lee Abstract Virtualization platform solutions
More informationA Study on Workload Imbalance Issues in Data Intensive Distributed Computing
A Study on Workload Imbalance Issues in Data Intensive Distributed Computing Sven Groot 1, Kazuo Goda 1, and Masaru Kitsuregawa 1 University of Tokyo, 4-6-1 Komaba, Meguro-ku, Tokyo 153-8505, Japan Abstract.
More informationRecognization of Satellite Images of Large Scale Data Based On Map- Reduce Framework
Recognization of Satellite Images of Large Scale Data Based On Map- Reduce Framework Vidya Dhondiba Jadhav, Harshada Jayant Nazirkar, Sneha Manik Idekar Dept. of Information Technology, JSPM s BSIOTR (W),
More informationMatchmaking: A New MapReduce Scheduling Technique
Matchmaking: A New MapReduce Scheduling Technique Chen He Ying Lu David Swanson Department of Computer Science and Engineering University of Nebraska-Lincoln Lincoln, U.S. {che,ylu,dswanson}@cse.unl.edu
More informationHadoop Scheduler w i t h Deadline Constraint
Hadoop Scheduler w i t h Deadline Constraint Geetha J 1, N UdayBhaskar 2, P ChennaReddy 3,Neha Sniha 4 1,4 Department of Computer Science and Engineering, M S Ramaiah Institute of Technology, Bangalore,
More informationResearch on Job Scheduling Algorithm in Hadoop
Journal of Computational Information Systems 7: 6 () 5769-5775 Available at http://www.jofcis.com Research on Job Scheduling Algorithm in Hadoop Yang XIA, Lei WANG, Qiang ZHAO, Gongxuan ZHANG School of
More informationGuidelines for Selecting Hadoop Schedulers based on System Heterogeneity
Noname manuscript No. (will be inserted by the editor) Guidelines for Selecting Hadoop Schedulers based on System Heterogeneity Aysan Rasooli Douglas G. Down Received: date / Accepted: date Abstract Hadoop
More informationDistributed Framework for Data Mining As a Service on Private Cloud
RESEARCH ARTICLE OPEN ACCESS Distributed Framework for Data Mining As a Service on Private Cloud Shraddha Masih *, Sanjay Tanwani** *Research Scholar & Associate Professor, School of Computer Science &
More informationIMPROVED FAIR SCHEDULING ALGORITHM FOR TASKTRACKER IN HADOOP MAP-REDUCE
IMPROVED FAIR SCHEDULING ALGORITHM FOR TASKTRACKER IN HADOOP MAP-REDUCE Mr. Santhosh S 1, Mr. Hemanth Kumar G 2 1 PG Scholor, 2 Asst. Professor, Dept. Of Computer Science & Engg, NMAMIT, (India) ABSTRACT
More informationNetwork-Aware Scheduling of MapReduce Framework on Distributed Clusters over High Speed Networks
Network-Aware Scheduling of MapReduce Framework on Distributed Clusters over High Speed Networks Praveenkumar Kondikoppa, Chui-Hui Chiu, Cheng Cui, Lin Xue and Seung-Jong Park Department of Computer Science,
More informationEvaluating Task Scheduling in Hadoop-based Cloud Systems
2013 IEEE International Conference on Big Data Evaluating Task Scheduling in Hadoop-based Cloud Systems Shengyuan Liu, Jungang Xu College of Computer and Control Engineering University of Chinese Academy
More informationOpen source software framework designed for storage and processing of large scale data on clusters of commodity hardware
Open source software framework designed for storage and processing of large scale data on clusters of commodity hardware Created by Doug Cutting and Mike Carafella in 2005. Cutting named the program after
More informationGraySort and MinuteSort at Yahoo on Hadoop 0.23
GraySort and at Yahoo on Hadoop.23 Thomas Graves Yahoo! May, 213 The Apache Hadoop[1] software library is an open source framework that allows for the distributed processing of large data sets across clusters
More informationEvaluating HDFS I/O Performance on Virtualized Systems
Evaluating HDFS I/O Performance on Virtualized Systems Xin Tang xtang@cs.wisc.edu University of Wisconsin-Madison Department of Computer Sciences Abstract Hadoop as a Service (HaaS) has received increasing
More informationA STUDY ON HADOOP ARCHITECTURE FOR BIG DATA ANALYTICS
A STUDY ON HADOOP ARCHITECTURE FOR BIG DATA ANALYTICS Dr. Ananthi Sheshasayee 1, J V N Lakshmi 2 1 Head Department of Computer Science & Research, Quaid-E-Millath Govt College for Women, Chennai, (India)
More informationBENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB
BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB Planet Size Data!? Gartner s 10 key IT trends for 2012 unstructured data will grow some 80% over the course of the next
More informationKeywords: Big Data, HDFS, Map Reduce, Hadoop
Volume 5, Issue 7, July 2015 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Configuration Tuning
More informationAnalysis and Optimization of Massive Data Processing on High Performance Computing Architecture
Analysis and Optimization of Massive Data Processing on High Performance Computing Architecture He Huang, Shanshan Li, Xiaodong Yi, Feng Zhang, Xiangke Liao and Pan Dong School of Computer Science National
More informationIntroduction to Hadoop
Introduction to Hadoop 1 What is Hadoop? the big data revolution extracting value from data cloud computing 2 Understanding MapReduce the word count problem more examples MCS 572 Lecture 24 Introduction
More informationAn Oracle White Paper June 2012. High Performance Connectors for Load and Access of Data from Hadoop to Oracle Database
An Oracle White Paper June 2012 High Performance Connectors for Load and Access of Data from Hadoop to Oracle Database Executive Overview... 1 Introduction... 1 Oracle Loader for Hadoop... 2 Oracle Direct
More informationScheduling Algorithms in MapReduce Distributed Mind
Scheduling Algorithms in MapReduce Distributed Mind Karthik Kotian, Jason A Smith, Ye Zhang Schedule Overview of topic (review) Hypothesis Research paper 1 Research paper 2 Research paper 3 Project software
More informationMapReduce and Hadoop Distributed File System
MapReduce and Hadoop Distributed File System 1 B. RAMAMURTHY Contact: Dr. Bina Ramamurthy CSE Department University at Buffalo (SUNY) bina@buffalo.edu http://www.cse.buffalo.edu/faculty/bina Partially
More informationComputing Load Aware and Long-View Load Balancing for Cluster Storage Systems
215 IEEE International Conference on Big Data (Big Data) Computing Load Aware and Long-View Load Balancing for Cluster Storage Systems Guoxin Liu and Haiying Shen and Haoyu Wang Department of Electrical
More informationEnergy Efficient MapReduce
Energy Efficient MapReduce Motivation: Energy consumption is an important aspect of datacenters efficiency, the total power consumption in the united states has doubled from 2000 to 2005, representing
More informationBig Data and Apache Hadoop s MapReduce
Big Data and Apache Hadoop s MapReduce Michael Hahsler Computer Science and Engineering Southern Methodist University January 23, 2012 Michael Hahsler (SMU/CSE) Hadoop/MapReduce January 23, 2012 1 / 23
More informationReducer Load Balancing and Lazy Initialization in Map Reduce Environment S.Mohanapriya, P.Natesan
Reducer Load Balancing and Lazy Initialization in Map Reduce Environment S.Mohanapriya, P.Natesan Abstract Big Data is revolutionizing 21st-century with increasingly huge amounts of data to store and be
More informationA Framework for Performance Analysis and Tuning in Hadoop Based Clusters
A Framework for Performance Analysis and Tuning in Hadoop Based Clusters Garvit Bansal Anshul Gupta Utkarsh Pyne LNMIIT, Jaipur, India Email: [garvit.bansal anshul.gupta utkarsh.pyne] @lnmiit.ac.in Manish
More informationIntroduction to Big Data! with Apache Spark" UC#BERKELEY#
Introduction to Big Data! with Apache Spark" UC#BERKELEY# This Lecture" The Big Data Problem" Hardware for Big Data" Distributing Work" Handling Failures and Slow Machines" Map Reduce and Complex Jobs"
More informationDynamicMR: A Dynamic Slot Allocation Optimization Framework for MapReduce Clusters
IEEE TRANSACTIONS ON CLOUD COMPUTING 1 DynamicMR: A Dynamic Slot Allocation Optimization Framework for MapReduce Clusters Shanjiang Tang, Bu-Sung Lee, Bingsheng He Abstract MapReduce is a popular computing
More informationScheduling Data Intensive Workloads through Virtualization on MapReduce based Clouds
ABSTRACT Scheduling Data Intensive Workloads through Virtualization on MapReduce based Clouds 1 B.Thirumala Rao, 2 L.S.S.Reddy Department of Computer Science and Engineering, Lakireddy Bali Reddy College
More informationhttp://www.paper.edu.cn
5 10 15 20 25 30 35 A platform for massive railway information data storage # SHAN Xu 1, WANG Genying 1, LIU Lin 2** (1. Key Laboratory of Communication and Information Systems, Beijing Municipal Commission
More informationGraySort on Apache Spark by Databricks
GraySort on Apache Spark by Databricks Reynold Xin, Parviz Deyhim, Ali Ghodsi, Xiangrui Meng, Matei Zaharia Databricks Inc. Apache Spark Sorting in Spark Overview Sorting Within a Partition Range Partitioner
More informationBig Data Storage Architecture Design in Cloud Computing
Big Data Storage Architecture Design in Cloud Computing Xuebin Chen 1, Shi Wang 1( ), Yanyan Dong 1, and Xu Wang 2 1 College of Science, North China University of Science and Technology, Tangshan, Hebei,
More informationAnalysis of Information Management and Scheduling Technology in Hadoop
Analysis of Information Management and Scheduling Technology in Hadoop Ma Weihua, Zhang Hong, Li Qianmu, Xia Bin School of Computer Science and Technology Nanjing University of Science and Engineering
More informationA Middleware Strategy to Survive Compute Peak Loads in Cloud
A Middleware Strategy to Survive Compute Peak Loads in Cloud Sasko Ristov Ss. Cyril and Methodius University Faculty of Information Sciences and Computer Engineering Skopje, Macedonia Email: sashko.ristov@finki.ukim.mk
More informationOpen source Google-style large scale data analysis with Hadoop
Open source Google-style large scale data analysis with Hadoop Ioannis Konstantinou Email: ikons@cslab.ece.ntua.gr Web: http://www.cslab.ntua.gr/~ikons Computing Systems Laboratory School of Electrical
More informationISSN:2320-0790. Keywords: HDFS, Replication, Map-Reduce I Introduction:
ISSN:2320-0790 Dynamic Data Replication for HPC Analytics Applications in Hadoop Ragupathi T 1, Sujaudeen N 2 1 PG Scholar, Department of CSE, SSN College of Engineering, Chennai, India 2 Assistant Professor,
More informationOptimization of Distributed Crawler under Hadoop
MATEC Web of Conferences 22, 0202 9 ( 2015) DOI: 10.1051/ matecconf/ 2015220202 9 C Owned by the authors, published by EDP Sciences, 2015 Optimization of Distributed Crawler under Hadoop Xiaochen Zhang*
More informationCost-effective Resource Provisioning for MapReduce in a Cloud
1 -effective Resource Provisioning for MapReduce in a Cloud Balaji Palanisamy, Member, IEEE, Aameek Singh, Member, IEEE Ling Liu, Senior Member, IEEE Abstract This paper presents a new MapReduce cloud
More informationAn Experimental Study of Load Balancing of OpenNebula Open-Source Cloud Computing Platform
An Experimental Study of Load Balancing of OpenNebula Open-Source Cloud Computing Platform A B M Moniruzzaman 1, Kawser Wazed Nafi 2, Prof. Syed Akhter Hossain 1 and Prof. M. M. A. Hashem 1 Department
More informationBenchmarking Hadoop & HBase on Violin
Technical White Paper Report Technical Report Benchmarking Hadoop & HBase on Violin Harnessing Big Data Analytics at the Speed of Memory Version 1.0 Abstract The purpose of benchmarking is to show advantages
More informationA Dynamic Resource Management with Energy Saving Mechanism for Supporting Cloud Computing
A Dynamic Resource Management with Energy Saving Mechanism for Supporting Cloud Computing Liang-Teh Lee, Kang-Yuan Liu, Hui-Yang Huang and Chia-Ying Tseng Department of Computer Science and Engineering,
More informationReduction of Data at Namenode in HDFS using harballing Technique
Reduction of Data at Namenode in HDFS using harballing Technique Vaibhav Gopal Korat, Kumar Swamy Pamu vgkorat@gmail.com swamy.uncis@gmail.com Abstract HDFS stands for the Hadoop Distributed File System.
More informationHadoop on a Low-Budget General Purpose HPC Cluster in Academia
Hadoop on a Low-Budget General Purpose HPC Cluster in Academia Paolo Garza, Paolo Margara, Nicolò Nepote, Luigi Grimaudo, and Elio Piccolo Dipartimento di Automatica e Informatica, Politecnico di Torino,
More informationR.K.Uskenbayeva 1, А.А. Kuandykov 2, Zh.B.Kalpeyeva 3, D.K.Kozhamzharova 4, N.K.Mukhazhanov 5
Distributed data processing in heterogeneous cloud environments R.K.Uskenbayeva 1, А.А. Kuandykov 2, Zh.B.Kalpeyeva 3, D.K.Kozhamzharova 4, N.K.Mukhazhanov 5 1 uskenbaevar@gmail.com, 2 abu.kuandykov@gmail.com,
More informationExploiting Cloud Heterogeneity for Optimized Cost/Performance MapReduce Processing
Exploiting Cloud Heterogeneity for Optimized Cost/Performance MapReduce Processing Zhuoyao Zhang University of Pennsylvania, USA zhuoyao@seas.upenn.edu Ludmila Cherkasova Hewlett-Packard Labs, USA lucy.cherkasova@hp.com
More informationEnhancing Dataset Processing in Hadoop YARN Performance for Big Data Applications
Enhancing Dataset Processing in Hadoop YARN Performance for Big Data Applications Ahmed Abdulhakim Al-Absi, Dae-Ki Kang and Myong-Jong Kim Abstract In Hadoop MapReduce distributed file system, as the input
More informationA SIMULATOR FOR LOAD BALANCING ANALYSIS IN DISTRIBUTED SYSTEMS
Mihai Horia Zaharia, Florin Leon, Dan Galea (3) A Simulator for Load Balancing Analysis in Distributed Systems in A. Valachi, D. Galea, A. M. Florea, M. Craus (eds.) - Tehnologii informationale, Editura
More informationMobile Cloud Computing for Data-Intensive Applications
Mobile Cloud Computing for Data-Intensive Applications Senior Thesis Final Report Vincent Teo, vct@andrew.cmu.edu Advisor: Professor Priya Narasimhan, priya@cs.cmu.edu Abstract The computational and storage
More informationAn Adaptive Scheduling Algorithm for Dynamic Heterogeneous Hadoop Systems
An Adaptive Scheduling Algorithm for Dynamic Heterogeneous Hadoop Systems Aysan Rasooli, Douglas G. Down Department of Computing and Software McMaster University {rasooa, downd}@mcmaster.ca Abstract The
More informationSCHEDULING IN CLOUD COMPUTING
SCHEDULING IN CLOUD COMPUTING Lipsa Tripathy, Rasmi Ranjan Patra CSA,CPGS,OUAT,Bhubaneswar,Odisha Abstract Cloud computing is an emerging technology. It process huge amount of data so scheduling mechanism
More informationBig Data Analysis and Its Scheduling Policy Hadoop
IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661,p-ISSN: 2278-8727, Volume 17, Issue 1, Ver. IV (Jan Feb. 2015), PP 36-40 www.iosrjournals.org Big Data Analysis and Its Scheduling Policy
More informationHadoop. MPDL-Frühstück 9. Dezember 2013 MPDL INTERN
Hadoop MPDL-Frühstück 9. Dezember 2013 MPDL INTERN Understanding Hadoop Understanding Hadoop What's Hadoop about? Apache Hadoop project (started 2008) downloadable open-source software library (current
More informationHiBench Introduction. Carson Wang (carson.wang@intel.com) Software & Services Group
HiBench Introduction Carson Wang (carson.wang@intel.com) Agenda Background Workloads Configurations Benchmark Report Tuning Guide Background WHY Why we need big data benchmarking systems? WHAT What is
More informationFigure 1. The cloud scales: Amazon EC2 growth [2].
- Chung-Cheng Li and Kuochen Wang Department of Computer Science National Chiao Tung University Hsinchu, Taiwan 300 shinji10343@hotmail.com, kwang@cs.nctu.edu.tw Abstract One of the most important issues
More informationNon-intrusive Slot Layering in Hadoop
213 13th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing Non-intrusive Layering in Hadoop Peng Lu, Young Choon Lee, Albert Y. Zomaya Center for Distributed and High Performance Computing,
More informationINTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY
INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY A PATH FOR HORIZING YOUR INNOVATIVE WORK A REVIEW ON HIGH PERFORMANCE DATA STORAGE ARCHITECTURE OF BIGDATA USING HDFS MS.
More informationPerformance and Energy Efficiency of. Hadoop deployment models
Performance and Energy Efficiency of Hadoop deployment models Contents Review: What is MapReduce Review: What is Hadoop Hadoop Deployment Models Metrics Experiment Results Summary MapReduce Introduced
More informationCURTAIL THE EXPENDITURE OF BIG DATA PROCESSING USING MIXED INTEGER NON-LINEAR PROGRAMMING
Journal homepage: http://www.journalijar.com INTERNATIONAL JOURNAL OF ADVANCED RESEARCH RESEARCH ARTICLE CURTAIL THE EXPENDITURE OF BIG DATA PROCESSING USING MIXED INTEGER NON-LINEAR PROGRAMMING R.Kohila
More informationMapReduce and Hadoop Distributed File System V I J A Y R A O
MapReduce and Hadoop Distributed File System 1 V I J A Y R A O The Context: Big-data Man on the moon with 32KB (1969); my laptop had 2GB RAM (2009) Google collects 270PB data in a month (2007), 20000PB
More informationA Novel Way of Deduplication Approach for Cloud Backup Services Using Block Index Caching Technique
A Novel Way of Deduplication Approach for Cloud Backup Services Using Block Index Caching Technique Jyoti Malhotra 1,Priya Ghyare 2 Associate Professor, Dept. of Information Technology, MIT College of
More informationIntroduction to Hadoop
1 What is Hadoop? Introduction to Hadoop We are living in an era where large volumes of data are available and the problem is to extract meaning from the data avalanche. The goal of the software tools
More informationHadoop Cluster Applications
Hadoop Overview Data analytics has become a key element of the business decision process over the last decade. Classic reporting on a dataset stored in a database was sufficient until recently, but yesterday
More informationImproving MapReduce Performance in Heterogeneous Environments
UC Berkeley Improving MapReduce Performance in Heterogeneous Environments Matei Zaharia, Andy Konwinski, Anthony Joseph, Randy Katz, Ion Stoica University of California at Berkeley Motivation 1. MapReduce
More informationMapReduce Approach to Collective Classification for Networks
MapReduce Approach to Collective Classification for Networks Wojciech Indyk 1, Tomasz Kajdanowicz 1, Przemyslaw Kazienko 1, and Slawomir Plamowski 1 Wroclaw University of Technology, Wroclaw, Poland Faculty
More informationCSE-E5430 Scalable Cloud Computing Lecture 2
CSE-E5430 Scalable Cloud Computing Lecture 2 Keijo Heljanko Department of Computer Science School of Science Aalto University keijo.heljanko@aalto.fi 14.9-2015 1/36 Google MapReduce A scalable batch processing
More informationInternational Journal of Computer Science Trends and Technology (IJCST) Volume 3 Issue 3, May-June 2015
RESEARCH ARTICLE OPEN ACCESS Ensuring Reliability and High Availability in Cloud by Employing a Fault Tolerance Enabled Load Balancing Algorithm G.Gayathri [1], N.Prabakaran [2] Department of Computer
More informationTask Scheduling in Hadoop
Task Scheduling in Hadoop Sagar Mamdapure Munira Ginwala Neha Papat SAE,Kondhwa SAE,Kondhwa SAE,Kondhwa Abstract Hadoop is widely used for storing large datasets and processing them efficiently under distributed
More informationEnergy-Saving Cloud Computing Platform Based On Micro-Embedded System
Energy-Saving Cloud Computing Platform Based On Micro-Embedded System Wen-Hsu HSIEH *, San-Peng KAO **, Kuang-Hung TAN **, Jiann-Liang CHEN ** * Department of Computer and Communication, De Lin Institute
More informationMethodology for predicting the energy consumption of SPMD application on virtualized environments *
Methodology for predicting the energy consumption of SPMD application on virtualized environments * Javier Balladini, Ronal Muresano +, Remo Suppi +, Dolores Rexachs + and Emilio Luque + * Computer Engineering
More informationMapReduce and Hadoop. Aaron Birkland Cornell Center for Advanced Computing. January 2012
MapReduce and Hadoop Aaron Birkland Cornell Center for Advanced Computing January 2012 Motivation Simple programming model for Big Data Distributed, parallel but hides this Established success at petabyte
More informationShareability and Locality Aware Scheduling Algorithm in Hadoop for Mobile Cloud Computing
Shareability and Locality Aware Scheduling Algorithm in Hadoop for Mobile Cloud Computing Hsin-Wen Wei 1,2, Che-Wei Hsu 2, Tin-Yu Wu 3, Wei-Tsong Lee 1 1 Department of Electrical Engineering, Tamkang University
More informationCloudSim: A Toolkit for Modeling and Simulation of Cloud Computing Environments and Evaluation of Resource Provisioning Algorithms
CloudSim: A Toolkit for Modeling and Simulation of Cloud Computing Environments and Evaluation of Resource Provisioning Algorithms Rodrigo N. Calheiros, Rajiv Ranjan, Anton Beloglazov, César A. F. De Rose,
More informationText Mining Approach for Big Data Analysis Using Clustering and Classification Methodologies
Text Mining Approach for Big Data Analysis Using Clustering and Classification Methodologies Somesh S Chavadi 1, Dr. Asha T 2 1 PG Student, 2 Professor, Department of Computer Science and Engineering,
More informationHow To Balance In Cloud Computing
A Review on Load Balancing Algorithms in Cloud Hareesh M J Dept. of CSE, RSET, Kochi hareeshmjoseph@ gmail.com John P Martin Dept. of CSE, RSET, Kochi johnpm12@gmail.com Yedhu Sastri Dept. of IT, RSET,
More informationFederated Big Data for resource aggregation and load balancing with DIRAC
Procedia Computer Science Volume 51, 2015, Pages 2769 2773 ICCS 2015 International Conference On Computational Science Federated Big Data for resource aggregation and load balancing with DIRAC Víctor Fernández
More informationThe Improved Job Scheduling Algorithm of Hadoop Platform
The Improved Job Scheduling Algorithm of Hadoop Platform Yingjie Guo a, Linzhi Wu b, Wei Yu c, Bin Wu d, Xiaotian Wang e a,b,c,d,e University of Chinese Academy of Sciences 100408, China b Email: wulinzhi1001@163.com
More informationEvaluating partitioning of big graphs
Evaluating partitioning of big graphs Fredrik Hallberg, Joakim Candefors, Micke Soderqvist fhallb@kth.se, candef@kth.se, mickeso@kth.se Royal Institute of Technology, Stockholm, Sweden Abstract. Distributed
More informationRole of Cloud Computing in Big Data Analytics Using MapReduce Component of Hadoop
Role of Cloud Computing in Big Data Analytics Using MapReduce Component of Hadoop Kanchan A. Khedikar Department of Computer Science & Engineering Walchand Institute of Technoloy, Solapur, Maharashtra,
More informationCLOUDDMSS: CLOUD-BASED DISTRIBUTED MULTIMEDIA STREAMING SERVICE SYSTEM FOR HETEROGENEOUS DEVICES
CLOUDDMSS: CLOUD-BASED DISTRIBUTED MULTIMEDIA STREAMING SERVICE SYSTEM FOR HETEROGENEOUS DEVICES 1 MYOUNGJIN KIM, 2 CUI YUN, 3 SEUNGHO HAN, 4 HANKU LEE 1,2,3,4 Department of Internet & Multimedia Engineering,
More informationPerformance Prediction, Sizing and Capacity Planning for Distributed E-Commerce Applications
Performance Prediction, Sizing and Capacity Planning for Distributed E-Commerce Applications by Samuel D. Kounev (skounev@ito.tu-darmstadt.de) Information Technology Transfer Office Abstract Modern e-commerce
More informationA PERFORMANCE ANALYSIS of HADOOP CLUSTERS in OPENSTACK CLOUD and in REAL SYSTEM
A PERFORMANCE ANALYSIS of HADOOP CLUSTERS in OPENSTACK CLOUD and in REAL SYSTEM Ramesh Maharjan and Manoj Shakya Department of Computer Science and Engineering Dhulikhel, Kavre, Nepal lazymesh@gmail.com,
More informationAnalysis and Modeling of MapReduce s Performance on Hadoop YARN
Analysis and Modeling of MapReduce s Performance on Hadoop YARN Qiuyi Tang Dept. of Mathematics and Computer Science Denison University tang_j3@denison.edu Dr. Thomas C. Bressoud Dept. of Mathematics and
More informationIndex Terms : Load rebalance, distributed file systems, clouds, movement cost, load imbalance, chunk.
Load Rebalancing for Distributed File Systems in Clouds. Smita Salunkhe, S. S. Sannakki Department of Computer Science and Engineering KLS Gogte Institute of Technology, Belgaum, Karnataka, India Affiliated
More informationAn Efficient Hybrid P2P MMOG Cloud Architecture for Dynamic Load Management. Ginhung Wang, Kuochen Wang
1 An Efficient Hybrid MMOG Cloud Architecture for Dynamic Load Management Ginhung Wang, Kuochen Wang Abstract- In recent years, massively multiplayer online games (MMOGs) become more and more popular.
More informationMap-Parallel Scheduling (mps) using Hadoop environment for job scheduler and time span for Multicore Processors
Map-Parallel Scheduling (mps) using Hadoop environment for job scheduler and time span for Sudarsanam P Abstract G. Singaravel Parallel computing is an base mechanism for data process with scheduling task,
More informationPROBLEM DIAGNOSIS FOR CLOUD COMPUTING
PROBLEM DIAGNOSIS FOR CLOUD COMPUTING Jiaqi Tan, Soila Kavulya, Xinghao Pan, Mike Kasick, Keith Bare, Eugene Marinelli, Rajeev Gandhi Priya Narasimhan Carnegie Mellon University Automated Problem Diagnosis
More informationSurvey on Job Schedulers in Hadoop Cluster
IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661, p- ISSN: 2278-8727Volume 15, Issue 1 (Sep. - Oct. 2013), PP 46-50 Bincy P Andrews 1, Binu A 2 1 (Rajagiri School of Engineering and Technology,
More informationChapter 7. Using Hadoop Cluster and MapReduce
Chapter 7 Using Hadoop Cluster and MapReduce Modeling and Prototyping of RMS for QoS Oriented Grid Page 152 7. Using Hadoop Cluster and MapReduce for Big Data Problems The size of the databases used in
More informationPerformance Modeling of MapReduce Jobs in Heterogeneous Cloud Environments Zhuoyao Zhang University of Pennsylvania zhuoyao@seas.upenn.
Performance Modeling of MapReduce Jobs in Heterogeneous Cloud Environments Zhuoyao Zhang University of Pennsylvania zhuoyao@seas.upenn.edu Ludmila Cherkasova Hewlett-Packard Labs lucy.cherkasova@hp.com
More informationThe Performance Characteristics of MapReduce Applications on Scalable Clusters
The Performance Characteristics of MapReduce Applications on Scalable Clusters Kenneth Wottrich Denison University Granville, OH 43023 wottri_k1@denison.edu ABSTRACT Many cluster owners and operators have
More informationResidual Traffic Based Task Scheduling in Hadoop
Residual Traffic Based Task Scheduling in Hadoop Daichi Tanaka University of Tsukuba Graduate School of Library, Information and Media Studies Tsukuba, Japan e-mail: s1421593@u.tsukuba.ac.jp Masatoshi
More informationHigh Performance Computing MapReduce & Hadoop. 17th Apr 2014
High Performance Computing MapReduce & Hadoop 17th Apr 2014 MapReduce Programming model for parallel processing vast amounts of data (TBs/PBs) distributed on commodity clusters Borrows from map() and reduce()
More information