INFO5011. Cloud Computing Semester 2, 2011 Lecture 11, Cloud Scheduling

Size: px
Start display at page:

Download "INFO5011. Cloud Computing Semester 2, 2011 Lecture 11, Cloud Scheduling"

Transcription

1 INFO5011 Cloud Computing Semester 2, 2011 Lecture 11, Cloud Scheduling COMMONWEALTH OF Copyright Regulations 1969 WARNING This material has been reproduced and communicated to you by or on behalf of the university of Sydney pursuant to Part VB of the Copyright Act 1968 (the Act). The material in this communication may be subject to copyright under the Act. Any further reproduction or communication of this material by you may be the subject of copyright protection under the Act. Do not remove this notice. The presentation is based on: Quincy: Fair Scheduling for Distributed Computing Clusters. Michael Isard, Vijayan Prabhakaran, Jon Currey, Udi Wieder, Kunal Talwar, and Andrew Goldberg, SOSP'09 Improving MapReduce Performance in Heterogeneous Environment. Matei, Zaharia, Andy Konwinski, Anthony D. Joseph, Randy Katz, Ion Stoica. OSDI 2008 Some content/diagrams are from the paper and the author s presentation

2 Outline Motivation - Default FIFO scheduling in Hadoop and its problems Quincy scheduling for distributed computing cluster - Proposed by Microsoft Research - Build on Dryad cluster - Homogeneous assumption - Focusing on determining which node to run which job Scheduling in Heterogeneous environment - Proposed by U.C. Berkley RAD lab - Heterogeneous assumption - Focusing on optimizing speculative execution 2

3 Motivation Big clusters used for jobs of varying sizes, durations Data from production search cluster used in Microsoft Google: 395s avg. time for Map Reduce job 3

4 The Hadoop scheduling The master knows before hand the number of mappers and number of reducers for each job. It also knows the number of available task slots on each worker node It is responsible for assigning tasks to node with vacant slot The default and the simplest scheduling using FIFO queue Diagram from the CACM version of the original MapReduce paper 4

5 The problem of simple FIFO scheduler maximum number of tasks Problem: Subsequent small jobs wait for large jobs to finish! 5

6 Hadoop running a single job One job: hadoop jar wordcount.jar info5011wordcounter.wordcount assn_input/n07.txt countout Hadoop job_ _1213 on hadoop = 7 blocks 6

7 Hadoop map task list for job_ _1213 7

8 Hadoop running 2 jobs concurrently hadoop jar wordcount.jar info5011wordcounter.wordcount assn_input/n00p2.txt countoutn00 hadoop jar wordcount.jar info5011wordcounter.wordcount user/zhouy/assn_input/n07.txt countoutn07 Similar job as the first one except with more reducers 8

9 Job smmaries first job finishes second job starts 9

10 Hadoop FIFO queue status The second, smaller job waits in queue 10

11 Hadoop map task list for job_ _1214 gpu1 gpu1 dm0 gpu0 gpu0 dm0 dm1 dm2 dm2 11

12 mapper 0 and mapper 9 12

13 Outline Motivation - Default FIFO scheduling in Hadoop and its problems Quincy scheduling for distributed computing cluster - Proposed by Microsoft Research - Build on Dryad cluster - Homogeneous assumption - Focusing on determining which node to run which job Scheduling in Heterogeneous environment - Proposed by U.C. Berkley RAD lab - Heterogeneous assumption - Focusing on optimizing speculative execution 13

14 Fair scheduling Job X takes t seconds when it runs exclusively on a cluster. X should take no more than Jt seconds when cluster has J concurrent jobs. Formally, for N computers and J jobs, each job should get atleast N/J computers. 14

15 Other scheduling approaches: Data Locality constrains - HPC jobs fetch data from a SAN, no need for co-location of data and computation. Data intensive workloads (MapReduce/Hadoop/Dryad) - have storage attached to computers. - Scheduling tasks near data improves performance. Jobtracker log: 15

16 Fairness vs. data locality The requirements of fairness and locality often conflict - A strategy that achieves optimal data locality will typically delay a job until its ideal resources are available - Fairness benefits from allocating the best available resources to a job as soon as possible after they are requested An important feature of Data intensive workloads (MapReduce/Hadoop/Dryad) - While running, tasks are independent of each other so killing one task will not impact another - In contrast, MPI jobs are made of sets of stateful processes tightly coupled by communicating with each other across the network. 16

17 Cluster architecture Diagram from authors original presentation 17

18 Baseline algorithm (Greedy) - Equivalent to FIFO scheduling in Hadoop Simple greedy fairness (GF) - Based on Hadoop s Fair Scheduler Queue based scheduling - Job j gets a baseline allocation A * j =min( M/K, N j ) where M: number of computer; K: number of concurrent jobs; N j total number of running and waiting tasks for job j. - If Σ j A * j<m the remaining slots are divided equally among jobs that have additional ready workers so that final allocation A j has Σ j A j =min(m, Σ j N j ). - The scheduler blocks job j whenever it is running A j tasks or more. It only assigns tasks of an unblocked job to a newly available computer. Fairness with preemption (GFP - When a job is running more than tasks, the scheduler will kill its over-quota tasks, starting with the most recently scheduled tasks first 18

19 Simple Greedy Fairness Diagram from authors original presentation 19

20 Sticky slots problem Under a steady state in which each job is occupying exactly its allocated quota of computers. - Whenever a task from job j completes on computer, another task from j will be assigned to m again - m sticks to j indefinitely whether or not j has any waiting tasks that have good data locality when run on m. Simple solution - Do not unblock job j once a task finishes - wait till j s running tasks falls below A j -M H, where M H is a hysteresis margin or Δ H seconds have passed. - In many cases, this delay is sufficient to allow another job s worker, with better locality to steal computer m. 20

21 Sticky slots illustrated (i) Diagrams and example in slides are from authors original presentation 21

22 Sticky slots illustrated (ii) 22

23 Sticky slots illustrated (iii) 23

24 Sticky slots illustrated (iv) 24

25 Sticky slots illustrated (v) 25

26 Sticky slots illustrated (vi) X 26

27 Sticky slots illustrated (vii) 27

28 Quincy-- Flow Based Scheduler Main idea - Matching = Scheduling - Each task is either scheduled on a computer c or un scheduled - Can assign a cost to any matching - Fairness constrains number of tasks that are scheduled - The goal is to minimize matching cost while obeying fairness constraints - Min-cost network flow problem - Instead of making local decisions [greedy], solve it globally. 28

29 Graph construction (i) Start with a directed graph representation of the cluster architecture. Rack aggregator Individual computers Sink node Cluster aggregator 29

30 Graph construction (ii) job 1 with 6 tasks and a root task Unscheduled node for job 1 Each receive one unit of flow as its supply Each task has an edge to U j. There is a single edge from U j to the sink. High cost on edges from tasks to U j. 30

31 Graph construction (iii) Add edges from tasks (T) to computers (C), if computer C has some data for task T. The cost is a function of the amount of data that would be transferred across rack and core switch 31

32 Graph construction (iv) Add edges from tasks (T) to rack (R), if R has some data for task T. The cost on the edge is set to the worst case cost that would result if the task were run on the least favorable computer in R 32

33 Graph construction (v) Add edges from all tasks (T) to cluster (X) The cost on the edge is set to the worst case cost for running the task on any computer in the cluster 33

34 Graph construction (vi) 0 cost edge from root task to computer to avoid preempting root task. Constrains how many tasks can run on each computer 34

35 A Feasible Matching Unscheduled job 35

36 Final graph Fairness constrains, setting it to 4 means at lest 2 tasks from job 1 needs to go through computer Fairness constrains, setting it to 2 means at lest 2 tasks from job 2 needs to go through computer 36

37 Workload: Some experiment results Typical Dryad jobs (Sort, Join, PageRank, WordCount, Prime). In total, 30 jobs with a mix of CPU, disk, and network intensive tasks Prime used as a worst-case job that hogs the cluster if started first. 240 computers in cluster. 8 racks, computers per rack. More than one metric used for evaluation. 37

38 Results (i) 38

39 Results (ii) 39

40 Results (iii) 40

41 Results (iv) 41

42 Solver overhead Discussion point - The observed average overhead in this 240 machine cluster is 7.64ms with a maximum cost of 57.59ms - Simulated average overhead in 2500 computers running 100 concurrent jobs is a little over a second per solution - Seems acceptable, but min-cost flow is recomputed from scratch each time a change occurs Applicable in other scheduling environment? - The easy mapping of the scheduling problem to a min-cost flow is due to - Tasks are relatively independent with each other, there is no correlation constraints - One dimensional capacity setting 42

43 Outline Motivation - Default FIFO scheduling in Hadoop and its problems Quincy scheduling for distributed computing cluster - Proposed by Microsoft Research - Build on Dryad cluster - Homogeneous assumption - Focusing on determining which node to run which job Scheduling in Heterogeneous environment - Proposed by U.C. Berkley RAD lab - Heterogeneous assumption - Focusing on optimizing speculative execution 43

44 Speculative execution Hadoop s straggler handling mechanism - If a node is available but is performing poorly, this is called a straggler - MapReduce has a build-in mechanism to run a speculative copy of its task on another machine to finish the computation faster. This paper tries to Improve the performance of speculative executions by - Define a new scheduling metric. - Choosing the right machines to run speculative tasks. - Capping the amount of speculative executions. 44

45 Progress score Hadoop monitors task progress using a progress score to select speculative tasks - Map task s progress score is the fraction of input data read - Reduce task s execution is divided into three phases, each of which account for 1/3 of the score. In each phases, the score is the fraction of data process When a task s progress score is less than the average for its category minus 0.2 and the task has run for at least one minute, it is marked as a straggler For a reduce task, the execution is divided into three phases, each of which accounts for 1/3 of the score progress score is the fraction of input data read Copy phase sort phase reduce phase 45

46 Hadoop s assumption Nodes can perform work at exactly the same rate Tasks progress at a constant rate throughout time There is no cost to launching a speculative task on an idle node The three phases of execution take approximately same time Tasks with a low progress score are stragglers Maps and Reduces require roughly the same amount of work 46

47 Breaking down the assumptions The first 2 assumptions talk about homogeneity. However - In a non-virtualized data center, there may be multiple generations of hardware - In a virtualized data center, multiple VMs are co-located on the same physical host. Diagrams from Ang Li, Xiaowei Yang, Srikanth Kandula, and Ming Zhang, CloudCmp: comparing public cloud providers. In Proceedings of the 10th annual conference on Internet measurement (IMC '10) 47

48 Heterogeneity are observed by other researchers Jorg Schad, Jens Dittrich, and Jorge-Arnulfo Quiane-Ruiz. Runtime measurements in the cloud: observing, analyzing, and reducing variance. In Proceedings of the 36 th International Conference on Very Large Data Bases(VLDB 10), September 13-17,2010, Singapore. Page

49 Other assumptions Assumption 3 that speculating tasks coast nothing, breaks down when resources are shared - Network is a bottleneck and speculative tasks may compete for disk I/O Assumption 4 that a task s progress score is approximately equal to its percent completion, does not hold especially for reduce tasks - The copy phase usually counts for more than 1/3 of the task execution time Assumption 5, that progress score is a good proxy for progress rate because tasks being at roughly the same time, can also be wrong - Number of mappers depends on number of blocks which might be much large the available slots. The mappers tend to run in waves (see slide 11). 49

50 Longest Approximate Time to End Design principle LATE scheduler - Always speculatively execute the task that we think will finish farthest into the future Different methods can be used to estimate time left Propose a simple heuristic based on progress rate - ProgressRate = ProgressScore/The amount of time the task has been running - Estimated time to completion = (1-ProgressScore)/ProgressRate - It assumes that tasks make progress at a roughly constant rate (there are exceptions to this assumption) 50

51 LATE parameters - SlowNodeThreshold: used to select fast node to launch speculative tasks - 25 th percentile of node progress - SpeculativeCap: used to control the number of speculative tasks that can be running at once - 10% of available task slots - SlowTaskThreshold: used to select task for speculative copy - 25 th percentile of task progress - Currently it does not consider data locality 51

52 Estimating finishing time the current ProgressRate computation assumes constant progress, which might not be true If a task s execution slows down in later phase, the ProgressRate might suggest the wrong task to speculative If it speed up, it won t affect the final prediction Mapper tasks progress in constant rate most of the time Reducer tasks are typically slowest in their first phase and speed up in later phases 52

53 Evaluation Environment - Amazon EC2 ( nodes) - Small Local Testbed (9 nodes) Measuring Heterogeneity on EC2 53

54 Heterogeneity setup Scheduling experiments - Assigning a varying number of VMs to each physical node - Create stragglers by running CPU and I/O intensive processes on same VM 54

55 EC2 Sort with Heterogeneity Each host sorted 128MB with a total of 30GB data Each job has 486 map tasks and 437 reduce tasks 55

56 EC2 Sort with Stragglers Each node sorted 256MB with a total of 25GB of data Stragglers created with 4 CPU (800KB array sort) and 4 disk (dd tasks) intensive processes 56

57 Sensitivity Analysis 57

58 Advantages Conclusion - Considers heterogeneity that appears in real life systems. - LATE speculatively executes the tasks that hurt the response time the most on fast nodes. - LATE caps speculative tasks to avoid overloading resources Limitations - Does not consider data locality - Finishing time estimation may predict wrong when tasks slows down 58

Improving MapReduce Performance in Heterogeneous Environments

Improving MapReduce Performance in Heterogeneous Environments UC Berkeley Improving MapReduce Performance in Heterogeneous Environments Matei Zaharia, Andy Konwinski, Anthony Joseph, Randy Katz, Ion Stoica University of California at Berkeley Motivation 1. MapReduce

More information

Quincy: Fair Scheduling for Distributed Computing Clusters

Quincy: Fair Scheduling for Distributed Computing Clusters Quincy: Fair Scheduling for Distributed Computing Clusters Michael Isard, Vijayan Prabhakaran, Jon Currey, Udi Wieder, Kunal Talwar and Andrew Goldberg Microsoft Research, Silicon Valley Mountain View,

More information

Improving MapReduce Performance in Heterogeneous Environments

Improving MapReduce Performance in Heterogeneous Environments Improving MapReduce Performance in Heterogeneous Environments Matei Zaharia, Andy Konwinski, Anthony D. Joseph, Randy Katz, Ion Stoica University of California, Berkeley {matei,andyk,adj,randy,stoica}@cs.berkeley.edu

More information

IMPROVED FAIR SCHEDULING ALGORITHM FOR TASKTRACKER IN HADOOP MAP-REDUCE

IMPROVED FAIR SCHEDULING ALGORITHM FOR TASKTRACKER IN HADOOP MAP-REDUCE IMPROVED FAIR SCHEDULING ALGORITHM FOR TASKTRACKER IN HADOOP MAP-REDUCE Mr. Santhosh S 1, Mr. Hemanth Kumar G 2 1 PG Scholor, 2 Asst. Professor, Dept. Of Computer Science & Engg, NMAMIT, (India) ABSTRACT

More information

Mesos: A Platform for Fine- Grained Resource Sharing in Data Centers (II)

Mesos: A Platform for Fine- Grained Resource Sharing in Data Centers (II) UC BERKELEY Mesos: A Platform for Fine- Grained Resource Sharing in Data Centers (II) Anthony D. Joseph LASER Summer School September 2013 My Talks at LASER 2013 1. AMP Lab introduction 2. The Datacenter

More information

MAPREDUCE [1] is proposed by Google in 2004 and

MAPREDUCE [1] is proposed by Google in 2004 and IEEE TRANSACTIONS ON COMPUTERS 1 Improving MapReduce Performance Using Smart Speculative Execution Strategy Qi Chen, Cheng Liu, and Zhen Xiao, Senior Member, IEEE Abstract MapReduce is a widely used parallel

More information

Scheduling Algorithms in MapReduce Distributed Mind

Scheduling Algorithms in MapReduce Distributed Mind Scheduling Algorithms in MapReduce Distributed Mind Karthik Kotian, Jason A Smith, Ye Zhang Schedule Overview of topic (review) Hypothesis Research paper 1 Research paper 2 Research paper 3 Project software

More information

Delay Scheduling. A Simple Technique for Achieving Locality and Fairness in Cluster Scheduling

Delay Scheduling. A Simple Technique for Achieving Locality and Fairness in Cluster Scheduling Delay Scheduling A Simple Technique for Achieving Locality and Fairness in Cluster Scheduling Matei Zaharia, Dhruba Borthakur *, Joydeep Sen Sarma *, Khaled Elmeleegy +, Scott Shenker, Ion Stoica UC Berkeley,

More information

Survey on Scheduling Algorithm in MapReduce Framework

Survey on Scheduling Algorithm in MapReduce Framework Survey on Scheduling Algorithm in MapReduce Framework Pravin P. Nimbalkar 1, Devendra P.Gadekar 2 1,2 Department of Computer Engineering, JSPM s Imperial College of Engineering and Research, Pune, India

More information

Improving Job Scheduling in Hadoop

Improving Job Scheduling in Hadoop Improving Job Scheduling in Hadoop MapReduce Himangi G. Patel, Richard Sonaliya Computer Engineering, Silver Oak College of Engineering and Technology, Ahmedabad, Gujarat, India. Abstract Hadoop is a framework

More information

Research on Job Scheduling Algorithm in Hadoop

Research on Job Scheduling Algorithm in Hadoop Journal of Computational Information Systems 7: 6 () 5769-5775 Available at http://www.jofcis.com Research on Job Scheduling Algorithm in Hadoop Yang XIA, Lei WANG, Qiang ZHAO, Gongxuan ZHANG School of

More information

Job Scheduling for MapReduce

Job Scheduling for MapReduce UC Berkeley Job Scheduling for MapReduce Matei Zaharia, Dhruba Borthakur *, Joydeep Sen Sarma *, Scott Shenker, Ion Stoica RAD Lab, * Facebook Inc 1 Motivation Hadoop was designed for large batch jobs

More information

Analysis of Information Management and Scheduling Technology in Hadoop

Analysis of Information Management and Scheduling Technology in Hadoop Analysis of Information Management and Scheduling Technology in Hadoop Ma Weihua, Zhang Hong, Li Qianmu, Xia Bin School of Computer Science and Technology Nanjing University of Science and Engineering

More information

Scheduling Data Intensive Workloads through Virtualization on MapReduce based Clouds

Scheduling Data Intensive Workloads through Virtualization on MapReduce based Clouds ABSTRACT Scheduling Data Intensive Workloads through Virtualization on MapReduce based Clouds 1 B.Thirumala Rao, 2 L.S.S.Reddy Department of Computer Science and Engineering, Lakireddy Bali Reddy College

More information

A Hybrid Scheduling Approach for Scalable Heterogeneous Hadoop Systems

A Hybrid Scheduling Approach for Scalable Heterogeneous Hadoop Systems A Hybrid Scheduling Approach for Scalable Heterogeneous Hadoop Systems Aysan Rasooli Department of Computing and Software McMaster University Hamilton, Canada Email: rasooa@mcmaster.ca Douglas G. Down

More information

GraySort on Apache Spark by Databricks

GraySort on Apache Spark by Databricks GraySort on Apache Spark by Databricks Reynold Xin, Parviz Deyhim, Ali Ghodsi, Xiangrui Meng, Matei Zaharia Databricks Inc. Apache Spark Sorting in Spark Overview Sorting Within a Partition Range Partitioner

More information

DynamicCloudSim: Simulating Heterogeneity in Computational Clouds

DynamicCloudSim: Simulating Heterogeneity in Computational Clouds DynamicCloudSim: Simulating Heterogeneity in Computational Clouds Marc Bux, Ulf Leser {bux leser}@informatik.hu-berlin.de The 2nd international workshop on Scalable Workflow Enactment Engines and Technologies

More information

Network-Aware Scheduling of MapReduce Framework on Distributed Clusters over High Speed Networks

Network-Aware Scheduling of MapReduce Framework on Distributed Clusters over High Speed Networks Network-Aware Scheduling of MapReduce Framework on Distributed Clusters over High Speed Networks Praveenkumar Kondikoppa, Chui-Hui Chiu, Cheng Cui, Lin Xue and Seung-Jong Park Department of Computer Science,

More information

Fair Scheduler. Table of contents

Fair Scheduler. Table of contents Table of contents 1 Purpose... 2 2 Introduction... 2 3 Installation... 3 4 Configuration...3 4.1 Scheduler Parameters in mapred-site.xml...4 4.2 Allocation File (fair-scheduler.xml)... 6 4.3 Access Control

More information

PIKACHU: How to Rebalance Load in Optimizing MapReduce On Heterogeneous Clusters

PIKACHU: How to Rebalance Load in Optimizing MapReduce On Heterogeneous Clusters PIKACHU: How to Rebalance Load in Optimizing MapReduce On Heterogeneous Clusters Rohan Gandhi, Di Xie, Y. Charlie Hu Purdue University Abstract For power, cost, and pricing reasons, datacenters are evolving

More information

A Survey of Cloud Computing Guanfeng Octides

A Survey of Cloud Computing Guanfeng Octides A Survey of Cloud Computing Guanfeng Nov 7, 2010 Abstract The principal service provided by cloud computing is that underlying infrastructure, which often consists of compute resources like storage, processors,

More information

Matchmaking: A New MapReduce Scheduling Technique

Matchmaking: A New MapReduce Scheduling Technique Matchmaking: A New MapReduce Scheduling Technique Chen He Ying Lu David Swanson Department of Computer Science and Engineering University of Nebraska-Lincoln Lincoln, U.S. {che,ylu,dswanson}@cse.unl.edu

More information

Survey on Job Schedulers in Hadoop Cluster

Survey on Job Schedulers in Hadoop Cluster IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661, p- ISSN: 2278-8727Volume 15, Issue 1 (Sep. - Oct. 2013), PP 46-50 Bincy P Andrews 1, Binu A 2 1 (Rajagiri School of Engineering and Technology,

More information

Heterogeneity-Aware Resource Allocation and Scheduling in the Cloud

Heterogeneity-Aware Resource Allocation and Scheduling in the Cloud Heterogeneity-Aware Resource Allocation and Scheduling in the Cloud Gunho Lee, Byung-Gon Chun, Randy H. Katz University of California, Berkeley, Yahoo! Research Abstract Data analytics are key applications

More information

Guidelines for Selecting Hadoop Schedulers based on System Heterogeneity

Guidelines for Selecting Hadoop Schedulers based on System Heterogeneity Noname manuscript No. (will be inserted by the editor) Guidelines for Selecting Hadoop Schedulers based on System Heterogeneity Aysan Rasooli Douglas G. Down Received: date / Accepted: date Abstract Hadoop

More information

The Improved Job Scheduling Algorithm of Hadoop Platform

The Improved Job Scheduling Algorithm of Hadoop Platform The Improved Job Scheduling Algorithm of Hadoop Platform Yingjie Guo a, Linzhi Wu b, Wei Yu c, Bin Wu d, Xiaotian Wang e a,b,c,d,e University of Chinese Academy of Sciences 100408, China b Email: wulinzhi1001@163.com

More information

Characterizing Task Usage Shapes in Google s Compute Clusters

Characterizing Task Usage Shapes in Google s Compute Clusters Characterizing Task Usage Shapes in Google s Compute Clusters Qi Zhang 1, Joseph L. Hellerstein 2, Raouf Boutaba 1 1 University of Waterloo, 2 Google Inc. Introduction Cloud computing is becoming a key

More information

DynamicMR: A Dynamic Slot Allocation Optimization Framework for MapReduce Clusters

DynamicMR: A Dynamic Slot Allocation Optimization Framework for MapReduce Clusters IEEE TRANSACTIONS ON CLOUD COMPUTING 1 DynamicMR: A Dynamic Slot Allocation Optimization Framework for MapReduce Clusters Shanjiang Tang, Bu-Sung Lee, Bingsheng He Abstract MapReduce is a popular computing

More information

Task Scheduling in Hadoop

Task Scheduling in Hadoop Task Scheduling in Hadoop Sagar Mamdapure Munira Ginwala Neha Papat SAE,Kondhwa SAE,Kondhwa SAE,Kondhwa Abstract Hadoop is widely used for storing large datasets and processing them efficiently under distributed

More information

Importance of Data locality

Importance of Data locality Importance of Data Locality - Gerald Abstract Scheduling Policies Test Applications Evaluation metrics Tests in Hadoop Test environment Tests Observations Job run time vs. Mmax Job run time vs. number

More information

Analysis and Optimization of Massive Data Processing on High Performance Computing Architecture

Analysis and Optimization of Massive Data Processing on High Performance Computing Architecture Analysis and Optimization of Massive Data Processing on High Performance Computing Architecture He Huang, Shanshan Li, Xiaodong Yi, Feng Zhang, Xiangke Liao and Pan Dong School of Computer Science National

More information

Non-intrusive Slot Layering in Hadoop

Non-intrusive Slot Layering in Hadoop 213 13th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing Non-intrusive Layering in Hadoop Peng Lu, Young Choon Lee, Albert Y. Zomaya Center for Distributed and High Performance Computing,

More information

Introduction to Big Data! with Apache Spark" UC#BERKELEY#

Introduction to Big Data! with Apache Spark UC#BERKELEY# Introduction to Big Data! with Apache Spark" UC#BERKELEY# This Lecture" The Big Data Problem" Hardware for Big Data" Distributing Work" Handling Failures and Slow Machines" Map Reduce and Complex Jobs"

More information

Introduction to Apache YARN Schedulers & Queues

Introduction to Apache YARN Schedulers & Queues Introduction to Apache YARN Schedulers & Queues In a nutshell, YARN was designed to address the many limitations (performance/scalability) embedded into Hadoop version 1 (MapReduce & HDFS). Some of the

More information

Hadoop in a Cloud. 서상원 (sangwon.seo@ahems.co.kr) http://www.ahems.co.kr. : : : : : Hadoop In a Cloud PlatformDay 2011 : : : : :

Hadoop in a Cloud. 서상원 (sangwon.seo@ahems.co.kr) http://www.ahems.co.kr. : : : : : Hadoop In a Cloud PlatformDay 2011 : : : : : Hadoop in a Cloud http://www.ahems.co.kr 서상원 (sangwon.seo@ahems.co.kr) 1 Outline Amazon Elastic MapReduce MapReduce on Virtual Machines Speculative Execution on Virtual Machines Towards Hadoop In a Cloud

More information

Apache Hadoop. Alexandru Costan

Apache Hadoop. Alexandru Costan 1 Apache Hadoop Alexandru Costan Big Data Landscape No one-size-fits-all solution: SQL, NoSQL, MapReduce, No standard, except Hadoop 2 Outline What is Hadoop? Who uses it? Architecture HDFS MapReduce Open

More information

A Virtual Machine Consolidation Framework for MapReduce Enabled Computing Clouds

A Virtual Machine Consolidation Framework for MapReduce Enabled Computing Clouds A Virtual Machine Consolidation Framework for MapReduce Enabled Computing Clouds Zhe Huang, Danny H.K. Tsang, James She Department of Electronic & Computer Engineering The Hong Kong University of Science

More information

Exploiting Cloud Heterogeneity for Optimized Cost/Performance MapReduce Processing

Exploiting Cloud Heterogeneity for Optimized Cost/Performance MapReduce Processing Exploiting Cloud Heterogeneity for Optimized Cost/Performance MapReduce Processing Zhuoyao Zhang University of Pennsylvania, USA zhuoyao@seas.upenn.edu Ludmila Cherkasova Hewlett-Packard Labs, USA lucy.cherkasova@hp.com

More information

A Study on Workload Imbalance Issues in Data Intensive Distributed Computing

A Study on Workload Imbalance Issues in Data Intensive Distributed Computing A Study on Workload Imbalance Issues in Data Intensive Distributed Computing Sven Groot 1, Kazuo Goda 1, and Masaru Kitsuregawa 1 University of Tokyo, 4-6-1 Komaba, Meguro-ku, Tokyo 153-8505, Japan Abstract.

More information

Enhancing Dataset Processing in Hadoop YARN Performance for Big Data Applications

Enhancing Dataset Processing in Hadoop YARN Performance for Big Data Applications Enhancing Dataset Processing in Hadoop YARN Performance for Big Data Applications Ahmed Abdulhakim Al-Absi, Dae-Ki Kang and Myong-Jong Kim Abstract In Hadoop MapReduce distributed file system, as the input

More information

Exploiting Cloud Heterogeneity to Optimize Performance and Cost of MapReduce Processing

Exploiting Cloud Heterogeneity to Optimize Performance and Cost of MapReduce Processing Exploiting Cloud Heterogeneity to Optimize Performance and Cost of MapReduce Processing Zhuoyao Zhang Google Inc. Mountain View, CA 9443, USA zhuoyao@google.com Ludmila Cherkasova Hewlett-Packard Labs

More information

MapReduce, Hadoop and Amazon AWS

MapReduce, Hadoop and Amazon AWS MapReduce, Hadoop and Amazon AWS Yasser Ganjisaffar http://www.ics.uci.edu/~yganjisa February 2011 What is Hadoop? A software framework that supports data-intensive distributed applications. It enables

More information

Residual Traffic Based Task Scheduling in Hadoop

Residual Traffic Based Task Scheduling in Hadoop Residual Traffic Based Task Scheduling in Hadoop Daichi Tanaka University of Tsukuba Graduate School of Library, Information and Media Studies Tsukuba, Japan e-mail: s1421593@u.tsukuba.ac.jp Masatoshi

More information

Preemptive ReduceTask Scheduling for Fair and Fast Job Completion

Preemptive ReduceTask Scheduling for Fair and Fast Job Completion Preemptive ReduceTask Scheduling for Fair and Fast Job Completion Yandong Wang Jian Tan Weikuan Yu Li Zhang Xiaoqiao Meng Auburn University IBM T.J Watson Research {wangyd,wkyu}@auburn.edu {tanji,zhangli,xmeng}@us.ibm.com

More information

An improved task assignment scheme for Hadoop running in the clouds

An improved task assignment scheme for Hadoop running in the clouds Dai and Bassiouni Journal of Cloud Computing: Advances, Systems and Applications 2013, 2:23 RESEARCH An improved task assignment scheme for Hadoop running in the clouds Wei Dai * and Mostafa Bassiouni

More information

Job Scheduling for Multi-User MapReduce Clusters

Job Scheduling for Multi-User MapReduce Clusters Job Scheduling for Multi-User MapReduce Clusters Matei Zaharia Dhruba Borthakur Joydeep Sen Sarma Khaled Elmeleegy Scott Shenker Ion Stoica Electrical Engineering and Computer Sciences University of California

More information

Performance Modeling of MapReduce Jobs in Heterogeneous Cloud Environments Zhuoyao Zhang University of Pennsylvania zhuoyao@seas.upenn.

Performance Modeling of MapReduce Jobs in Heterogeneous Cloud Environments Zhuoyao Zhang University of Pennsylvania zhuoyao@seas.upenn. Performance Modeling of MapReduce Jobs in Heterogeneous Cloud Environments Zhuoyao Zhang University of Pennsylvania zhuoyao@seas.upenn.edu Ludmila Cherkasova Hewlett-Packard Labs lucy.cherkasova@hp.com

More information

Big Data Analysis and Its Scheduling Policy Hadoop

Big Data Analysis and Its Scheduling Policy Hadoop IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661,p-ISSN: 2278-8727, Volume 17, Issue 1, Ver. IV (Jan Feb. 2015), PP 36-40 www.iosrjournals.org Big Data Analysis and Its Scheduling Policy

More information

Cloud Based Dynamic Workload Management

Cloud Based Dynamic Workload Management International Journal of scientific research and management (IJSRM) Volume 2 Issue 6 Pages 940-945 2014 Website: www.ijsrm.in ISSN (e): 2321-3418 Cloud Based Dynamic Workload Management Ms. Betsy M Babykutty

More information

Towards a Resource Aware Scheduler in Hadoop

Towards a Resource Aware Scheduler in Hadoop Towards a Resource Aware Scheduler in Hadoop Mark Yong, Nitin Garegrat, Shiwali Mohan Computer Science and Engineering, University of Michigan, Ann Arbor December 21, 2009 Abstract Hadoop-MapReduce is

More information

Efficient Data Replication Scheme based on Hadoop Distributed File System

Efficient Data Replication Scheme based on Hadoop Distributed File System , pp. 177-186 http://dx.doi.org/10.14257/ijseia.2015.9.12.16 Efficient Data Replication Scheme based on Hadoop Distributed File System Jungha Lee 1, Jaehwa Chung 2 and Daewon Lee 3* 1 Division of Supercomputing,

More information

Hadoop Fair Scheduler Design Document

Hadoop Fair Scheduler Design Document Hadoop Fair Scheduler Design Document October 18, 2010 Contents 1 Introduction 2 2 Fair Scheduler Goals 2 3 Scheduler Features 2 3.1 Pools........................................ 2 3.2 Minimum Shares.................................

More information

Towards Predictable Datacenter Networks

Towards Predictable Datacenter Networks Towards Predictable Datacenter Networks Hitesh Ballani, Paolo Costa, Thomas Karagiannis and Ant Rowstron Microsoft Research, Cambridge This talk is about Guaranteeing network performance for tenants in

More information

Fault Tolerance in Hadoop for Work Migration

Fault Tolerance in Hadoop for Work Migration 1 Fault Tolerance in Hadoop for Work Migration Shivaraman Janakiraman Indiana University Bloomington ABSTRACT Hadoop is a framework that runs applications on large clusters which are built on numerous

More information

Delay Scheduling: A Simple Technique for Achieving Locality and Fairness in Cluster Scheduling

Delay Scheduling: A Simple Technique for Achieving Locality and Fairness in Cluster Scheduling Delay Scheduling: A Simple Technique for Achieving Locality and Fairness in Cluster Scheduling Matei Zaharia University of California, Berkeley matei@berkeley.edu Khaled Elmeleegy Yahoo! Research khaled@yahoo-inc.com

More information

Process Scheduling CS 241. February 24, 2012. Copyright University of Illinois CS 241 Staff

Process Scheduling CS 241. February 24, 2012. Copyright University of Illinois CS 241 Staff Process Scheduling CS 241 February 24, 2012 Copyright University of Illinois CS 241 Staff 1 Announcements Mid-semester feedback survey (linked off web page) MP4 due Friday (not Tuesday) Midterm Next Tuesday,

More information

Keywords: Big Data, HDFS, Map Reduce, Hadoop

Keywords: Big Data, HDFS, Map Reduce, Hadoop Volume 5, Issue 7, July 2015 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Configuration Tuning

More information

Towards Improving MapReduce Task Scheduling Using Online Simulation Based Predictions

Towards Improving MapReduce Task Scheduling Using Online Simulation Based Predictions Towards Improving MapReduce Task Scheduling Using Online Simulation Based s Guanying Wang, Aleksandr Khasymski, Krish K. R., Ali R. Butt Department of Computer Science, Virginia Tech Email: {wanggy, khasymskia,

More information

Big Data and Apache Hadoop s MapReduce

Big Data and Apache Hadoop s MapReduce Big Data and Apache Hadoop s MapReduce Michael Hahsler Computer Science and Engineering Southern Methodist University January 23, 2012 Michael Hahsler (SMU/CSE) Hadoop/MapReduce January 23, 2012 1 / 23

More information

Facilitating Consistency Check between Specification and Implementation with MapReduce Framework

Facilitating Consistency Check between Specification and Implementation with MapReduce Framework Facilitating Consistency Check between Specification and Implementation with MapReduce Framework Shigeru KUSAKABE, Yoichi OMORI, and Keijiro ARAKI Grad. School of Information Science and Electrical Engineering,

More information

Hadoop Scheduler w i t h Deadline Constraint

Hadoop Scheduler w i t h Deadline Constraint Hadoop Scheduler w i t h Deadline Constraint Geetha J 1, N UdayBhaskar 2, P ChennaReddy 3,Neha Sniha 4 1,4 Department of Computer Science and Engineering, M S Ramaiah Institute of Technology, Bangalore,

More information

Use of Hadoop File System for Nuclear Physics Analyses in STAR

Use of Hadoop File System for Nuclear Physics Analyses in STAR 1 Use of Hadoop File System for Nuclear Physics Analyses in STAR EVAN SANGALINE UC DAVIS Motivations 2 Data storage a key component of analysis requirements Transmission and storage across diverse resources

More information

Dynamic Workload Management in Heterogeneous Cloud Computing Environments

Dynamic Workload Management in Heterogeneous Cloud Computing Environments Dynamic Workload Management in Heterogeneous Cloud Computing Environments Qi Zhang and Raouf Boutaba University of Waterloo IEEE/IFIP Network Operations and Management Symposium Krakow, Poland May 7, 2014

More information

Computing Load Aware and Long-View Load Balancing for Cluster Storage Systems

Computing Load Aware and Long-View Load Balancing for Cluster Storage Systems 215 IEEE International Conference on Big Data (Big Data) Computing Load Aware and Long-View Load Balancing for Cluster Storage Systems Guoxin Liu and Haiying Shen and Haoyu Wang Department of Electrical

More information

How To Make A Cluster Of Workable, Efficient, And Efficient With A Distributed Scheduler

How To Make A Cluster Of Workable, Efficient, And Efficient With A Distributed Scheduler : A Platform for Fine-Grained Resource Sharing in the Data Center Benjamin Hindman, Andy Konwinski, Matei Zaharia, Ali Ghodsi, Anthony D. Joseph, Randy Katz, Scott Shenker, Ion Stoica University of California,

More information

Windows Server Performance Monitoring

Windows Server Performance Monitoring Spot server problems before they are noticed The system s really slow today! How often have you heard that? Finding the solution isn t so easy. The obvious questions to ask are why is it running slowly

More information

HYPER-CONVERGED INFRASTRUCTURE STRATEGIES

HYPER-CONVERGED INFRASTRUCTURE STRATEGIES 1 HYPER-CONVERGED INFRASTRUCTURE STRATEGIES MYTH BUSTING & THE FUTURE OF WEB SCALE IT 2 ROADMAP INFORMATION DISCLAIMER EMC makes no representation and undertakes no obligations with regard to product planning

More information

Dynamic Resource allocation in Cloud

Dynamic Resource allocation in Cloud Dynamic Resource allocation in Cloud ABSTRACT: Cloud computing allows business customers to scale up and down their resource usage based on needs. Many of the touted gains in the cloud model come from

More information

A Framework for Performance Analysis and Tuning in Hadoop Based Clusters

A Framework for Performance Analysis and Tuning in Hadoop Based Clusters A Framework for Performance Analysis and Tuning in Hadoop Based Clusters Garvit Bansal Anshul Gupta Utkarsh Pyne LNMIIT, Jaipur, India Email: [garvit.bansal anshul.gupta utkarsh.pyne] @lnmiit.ac.in Manish

More information

Energy Efficient MapReduce

Energy Efficient MapReduce Energy Efficient MapReduce Motivation: Energy consumption is an important aspect of datacenters efficiency, the total power consumption in the united states has doubled from 2000 to 2005, representing

More information

Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center

Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center : A Platform for Fine-Grained Resource Sharing in the Data Center Benjamin Hindman, Andy Konwinski, Matei Zaharia, Ali Ghodsi, Anthony D. Joseph, Randy Katz, Scott Shenker, Ion Stoica University of California,

More information

Evaluating Task Scheduling in Hadoop-based Cloud Systems

Evaluating Task Scheduling in Hadoop-based Cloud Systems 2013 IEEE International Conference on Big Data Evaluating Task Scheduling in Hadoop-based Cloud Systems Shengyuan Liu, Jungang Xu College of Computer and Control Engineering University of Chinese Academy

More information

Figure 1. The cloud scales: Amazon EC2 growth [2].

Figure 1. The cloud scales: Amazon EC2 growth [2]. - Chung-Cheng Li and Kuochen Wang Department of Computer Science National Chiao Tung University Hsinchu, Taiwan 300 shinji10343@hotmail.com, kwang@cs.nctu.edu.tw Abstract One of the most important issues

More information

: Tiering Storage for Data Analytics in the Cloud

: Tiering Storage for Data Analytics in the Cloud : Tiering Storage for Data Analytics in the Cloud Yue Cheng, M. Safdar Iqbal, Aayush Gupta, Ali R. Butt Virginia Tech, IBM Research Almaden Cloud enables cost-efficient data analytics Amazon EMR Cloud

More information

Enhancing the Scalability of Virtual Machines in Cloud

Enhancing the Scalability of Virtual Machines in Cloud Enhancing the Scalability of Virtual Machines in Cloud Chippy.A #1, Ashok Kumar.P #2, Deepak.S #3, Ananthi.S #4 # Department of Computer Science and Engineering, SNS College of Technology Coimbatore, Tamil

More information

Storage I/O Control: Proportional Allocation of Shared Storage Resources

Storage I/O Control: Proportional Allocation of Shared Storage Resources Storage I/O Control: Proportional Allocation of Shared Storage Resources Chethan Kumar Sr. Member of Technical Staff, R&D VMware, Inc. Outline The Problem Storage IO Control (SIOC) overview Technical Details

More information

Do You Feel the Lag of Your Hadoop?

Do You Feel the Lag of Your Hadoop? Do You Feel the Lag of Your Hadoop? Yuxuan Jiang, Zhe Huang, and Danny H.K. Tsang Department of Electronic and Computer Engineering The Hong Kong University of Science and Technology, Hong Kong Email:

More information

Computing at Scale: Resource Scheduling Architectural Evolution and Introduction to Fuxi System

Computing at Scale: Resource Scheduling Architectural Evolution and Introduction to Fuxi System Computing at Scale: Resource Scheduling Architectural Evolution and Introduction to Fuxi System Renyu Yang( 杨 任 宇 ) Supervised by Prof. Jie Xu Ph.D. student@ Beihang University Research Intern @ Alibaba

More information

Exploring MapReduce Efficiency with Highly-Distributed Data

Exploring MapReduce Efficiency with Highly-Distributed Data Exploring MapReduce Efficiency with Highly-Distributed Data Michael Cardosa, Chenyu Wang, Anshuman Nangia, Abhishek Chandra, Jon Weissman University of Minnesota Minneapolis, MN, A {cardosa,chwang,nangia,chandra,jon}@cs.umn.edu

More information

The Impact of Capacity Scheduler Configuration Settings on MapReduce Jobs

The Impact of Capacity Scheduler Configuration Settings on MapReduce Jobs The Impact of Capacity Scheduler Configuration Settings on MapReduce Jobs Jagmohan Chauhan, Dwight Makaroff and Winfried Grassmann Dept. of Computer Science, University of Saskatchewan Saskatoon, SK, CANADA

More information

MSU Tier 3 Usage and Troubleshooting. James Koll

MSU Tier 3 Usage and Troubleshooting. James Koll MSU Tier 3 Usage and Troubleshooting James Koll Overview Dedicated computing for MSU ATLAS members Flexible user environment ~500 job slots of various configurations ~150 TB disk space 2 Condor commands

More information

International Journal of Computer & Organization Trends Volume21 Number1 June 2015 A Study on Load Balancing in Cloud Computing

International Journal of Computer & Organization Trends Volume21 Number1 June 2015 A Study on Load Balancing in Cloud Computing A Study on Load Balancing in Cloud Computing * Parveen Kumar * Er.Mandeep Kaur Guru kashi University,Talwandi Sabo Guru kashi University,Talwandi Sabo Abstract: Load Balancing is a computer networking

More information

See Spot Run: Using Spot Instances for MapReduce Workflows

See Spot Run: Using Spot Instances for MapReduce Workflows See Spot Run: Using Spot Instances for MapReduce Workflows Navraj Chohan Claris Castillo Mike Spreitzer Malgorzata Steinder Asser Tantawi Chandra Krintz IBM Watson Research Hawthorne, New York Computer

More information

Duke University http://www.cs.duke.edu/starfish

Duke University http://www.cs.duke.edu/starfish Herodotos Herodotou, Harold Lim, Fei Dong, Shivnath Babu Duke University http://www.cs.duke.edu/starfish Practitioners of Big Data Analytics Google Yahoo! Facebook ebay Physicists Biologists Economists

More information

Mammoth: Gearing Hadoop Towards Memory-Intensive MapReduce Applications

Mammoth: Gearing Hadoop Towards Memory-Intensive MapReduce Applications 1 Mammoth: Gearing Hadoop Towards Memory-Intensive MapReduce Applications Xuanhua Shi 1, Ming Chen 1, Ligang He 2,XuXie 1,LuLu 1, Hai Jin 1, Yong Chen 3, and Song Wu 1 1 SCTS/CGCL, School of Computer,

More information

DELL. Virtual Desktop Infrastructure Study END-TO-END COMPUTING. Dell Enterprise Solutions Engineering

DELL. Virtual Desktop Infrastructure Study END-TO-END COMPUTING. Dell Enterprise Solutions Engineering DELL Virtual Desktop Infrastructure Study END-TO-END COMPUTING Dell Enterprise Solutions Engineering 1 THIS WHITE PAPER IS FOR INFORMATIONAL PURPOSES ONLY, AND MAY CONTAIN TYPOGRAPHICAL ERRORS AND TECHNICAL

More information

Lecture Outline Overview of real-time scheduling algorithms Outline relative strengths, weaknesses

Lecture Outline Overview of real-time scheduling algorithms Outline relative strengths, weaknesses Overview of Real-Time Scheduling Embedded Real-Time Software Lecture 3 Lecture Outline Overview of real-time scheduling algorithms Clock-driven Weighted round-robin Priority-driven Dynamic vs. static Deadline

More information

High performance computing network for cloud environment using simulators

High performance computing network for cloud environment using simulators High performance computing network for cloud environment using simulators Ajith Singh. N 1 and M. Hemalatha 2 1 Ph.D, Research Scholar (CS), Karpagam University, Coimbatore, India 2 Prof & Head, Department

More information

Lecture 24: WSC, Datacenters. Topics: network-on-chip wrap-up, warehouse-scale computing and datacenters (Sections 6.1-6.7)

Lecture 24: WSC, Datacenters. Topics: network-on-chip wrap-up, warehouse-scale computing and datacenters (Sections 6.1-6.7) Lecture 24: WSC, Datacenters Topics: network-on-chip wrap-up, warehouse-scale computing and datacenters (Sections 6.1-6.7) 1 Topology Examples Grid Torus Hypercube Criteria 64 nodes Performance Bisection

More information

Big Data Processing using Hadoop. Shadi Ibrahim Inria, Rennes - Bretagne Atlantique Research Center

Big Data Processing using Hadoop. Shadi Ibrahim Inria, Rennes - Bretagne Atlantique Research Center Big Data Processing using Hadoop Shadi Ibrahim Inria, Rennes - Bretagne Atlantique Research Center Apache Hadoop Hadoop INRIA S.IBRAHIM 2 2 Hadoop Hadoop is a top- level Apache project» Open source implementation

More information

Cloud Management: Knowing is Half The Battle

Cloud Management: Knowing is Half The Battle Cloud Management: Knowing is Half The Battle Raouf BOUTABA David R. Cheriton School of Computer Science University of Waterloo Joint work with Qi Zhang, Faten Zhani (University of Waterloo) and Joseph

More information

This exam contains 13 pages (including this cover page) and 18 questions. Check to see if any pages are missing.

This exam contains 13 pages (including this cover page) and 18 questions. Check to see if any pages are missing. Big Data Processing 2013-2014 Q2 April 7, 2014 (Resit) Lecturer: Claudia Hauff Time Limit: 180 Minutes Name: Answer the questions in the spaces provided on this exam. If you run out of room for an answer,

More information

Optimizing Cost and Performance Trade-Offs for MapReduce Job Processing in the Cloud

Optimizing Cost and Performance Trade-Offs for MapReduce Job Processing in the Cloud Optimizing Cost and Performance Trade-Offs for MapReduce Job Processing in the Cloud Zhuoyao Zhang University of Pennsylvania zhuoyao@seas.upenn.edu Ludmila Cherkasova Hewlett-Packard Labs lucy.cherkasova@hp.com

More information

Resource Scalability for Efficient Parallel Processing in Cloud

Resource Scalability for Efficient Parallel Processing in Cloud Resource Scalability for Efficient Parallel Processing in Cloud ABSTRACT Govinda.K #1, Abirami.M #2, Divya Mercy Silva.J #3 #1 SCSE, VIT University #2 SITE, VIT University #3 SITE, VIT University In the

More information

Cloud Computing using MapReduce, Hadoop, Spark

Cloud Computing using MapReduce, Hadoop, Spark Cloud Computing using MapReduce, Hadoop, Spark Benjamin Hindman benh@cs.berkeley.edu Why this talk? At some point, you ll have enough data to run your parallel algorithms on multiple computers SPMD (e.g.,

More information

Research in Operating Systems Sparrow

Research in Operating Systems Sparrow Research in Operating Systems Sparrow Sparrow: Distributed, Low Latency Scheduling K. Ousterhout, P. Wendell, M. Zaharia and I. Stoica. In Proc. of SOSP 2013 *Slides partially based on Ousternout s presentation

More information

VirtualCenter Database Performance for Microsoft SQL Server 2005 VirtualCenter 2.5

VirtualCenter Database Performance for Microsoft SQL Server 2005 VirtualCenter 2.5 Performance Study VirtualCenter Database Performance for Microsoft SQL Server 2005 VirtualCenter 2.5 VMware VirtualCenter uses a database to store metadata on the state of a VMware Infrastructure environment.

More information

Accelerate the Performance of Virtualized Databases Using PernixData FVP Software

Accelerate the Performance of Virtualized Databases Using PernixData FVP Software WHITE PAPER Accelerate the Performance of Virtualized Databases Using PernixData FVP Software Increase SQL Transactions and Minimize Latency with a Flash Hypervisor 1 Virtualization saves substantial time

More information

An Experimental Approach Towards Big Data for Analyzing Memory Utilization on a Hadoop cluster using HDFS and MapReduce.

An Experimental Approach Towards Big Data for Analyzing Memory Utilization on a Hadoop cluster using HDFS and MapReduce. An Experimental Approach Towards Big Data for Analyzing Memory Utilization on a Hadoop cluster using HDFS and MapReduce. Amrit Pal Stdt, Dept of Computer Engineering and Application, National Institute

More information

From GWS to MapReduce: Google s Cloud Technology in the Early Days

From GWS to MapReduce: Google s Cloud Technology in the Early Days Large-Scale Distributed Systems From GWS to MapReduce: Google s Cloud Technology in the Early Days Part II: MapReduce in a Datacenter COMP6511A Spring 2014 HKUST Lin Gu lingu@ieee.org MapReduce/Hadoop

More information