Modeling and Optimization of Resource Allocation in Cloud

Similar documents
Hadoop Architecture. Part 1

Task Scheduling in Hadoop

Reference Architecture and Best Practices for Virtualizing Hadoop Workloads Justin Murray VMware

Enhancing MapReduce Functionality for Optimizing Workloads on Data Centers

Welcome to the unit of Hadoop Fundamentals on Hadoop architecture. I will begin with a terminology review and then cover the major components

Apache Hadoop. Alexandru Costan

Fault Tolerance in Hadoop for Work Migration

Mobile Cloud Computing for Data-Intensive Applications

Hadoop Scheduler w i t h Deadline Constraint


COST MINIMIZATION OF RUNNING MAPREDUCE ACROSS GEOGRAPHICALLY DISTRIBUTED DATA CENTERS

CSE-E5430 Scalable Cloud Computing Lecture 2

MapReduce and Hadoop. Aaron Birkland Cornell Center for Advanced Computing. January 2012

Big Data With Hadoop

Network Infrastructure Services CS848 Project

Introduction to Hadoop. New York Oracle User Group Vikas Sawhney

Parallel Databases. Parallel Architectures. Parallelism Terminology 1/4/2015. Increase performance by performing operations in parallel

Hadoop MapReduce and Spark. Giorgio Pedrazzi, CINECA-SCAI School of Data Analytics and Visualisation Milan, 10/06/2015

Big Data - Infrastructure Considerations

Efficient and Robust Allocation Algorithms in Clouds under Memory Constraints

Lecture 5: GFS & HDFS! Claudia Hauff (Web Information Systems)! ti2736b-ewi@tudelft.nl

Big Data Analysis and Its Scheduling Policy Hadoop

Survey on Scheduling Algorithm in MapReduce Framework

Survey on Job Schedulers in Hadoop Cluster

Sriram Krishnan, Ph.D.

Energy Efficient MapReduce

Dynamic Resource Allocation in Software Defined and Virtual Networks: A Comparative Analysis

The Performance Characteristics of MapReduce Applications on Scalable Clusters

The Comprehensive Performance Rating for Hadoop Clusters on Cloud Computing Platform

Introduction to Cloud Computing

A very short Intro to Hadoop

Big Data Technology Core Hadoop: HDFS-YARN Internals

Data-intensive computing systems

Big Data Analytics(Hadoop) Prepared By : Manoj Kumar Joshi & Vikas Sawhney

CSE 590: Special Topics Course ( Supercomputing ) Lecture 10 ( MapReduce& Hadoop)

Parallel Processing of cluster by Map Reduce

INTRODUCTION TO CLOUD COMPUTING CEN483 PARALLEL AND DISTRIBUTED SYSTEMS

Hadoop & Spark Using Amazon EMR

Energetic Resource Allocation Framework Using Virtualization in Cloud

Network-Aware Scheduling of MapReduce Framework on Distributed Clusters over High Speed Networks

<Insert Picture Here> Oracle and/or Hadoop And what you need to know

Virtualizing Apache Hadoop. June, 2012

Parallel Data Mining and Assurance Service Model Using Hadoop in Cloud

Lecture 32 Big Data. 1. Big Data problem 2. Why the excitement about big data 3. What is MapReduce 4. What is Hadoop 5. Get started with Hadoop

Planning the Migration of Enterprise Applications to the Cloud

A Multilevel Secure MapReduce Framework for Cross-Domain Information Sharing in the Cloud

Dynamic Workload Management in Heterogeneous Cloud Computing Environments

International Journal of Advance Research in Computer Science and Management Studies

Driving IBM BigInsights Performance Over GPFS Using InfiniBand+RDMA

A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM

A STUDY ON HADOOP ARCHITECTURE FOR BIG DATA ANALYTICS

Weekly Report. Hadoop Introduction. submitted By Anurag Sharma. Department of Computer Science and Engineering. Indian Institute of Technology Bombay

Multi-Datacenter Replication

Apache HBase. Crazy dances on the elephant back

Hadoop: Embracing future hardware

Take An Internal Look at Hadoop. Hairong Kuang Grid Team, Yahoo! Inc

WORKFLOW ENGINE FOR CLOUDS

A SLA-Based Method for Big-Data Transfers with Multi-Criteria Optimization Constraints for IaaS

Mining Large Datasets: Case of Mining Graph Data in the Cloud

Cloud Platforms, Challenges & Hadoop. Aditee Rele Karpagam Venkataraman Janani Ravi

Department of Computer Science University of Cyprus EPL646 Advanced Topics in Databases. Lecture 15

Figure 1. The cloud scales: Amazon EC2 growth [2].

Hadoop. History and Introduction. Explained By Vaibhav Agarwal

How To Understand Cloud Computing

Open source software framework designed for storage and processing of large scale data on clusters of commodity hardware

Infrastructure as a Service (IaaS)

Data-Intensive Computing with Map-Reduce and Hadoop

A SURVEY ON MAPREDUCE IN CLOUD COMPUTING

R.K.Uskenbayeva 1, А.А. Kuandykov 2, Zh.B.Kalpeyeva 3, D.K.Kozhamzharova 4, N.K.Mukhazhanov 5

Cloud Management: Knowing is Half The Battle

Processing of massive data: MapReduce. 2. Hadoop. New Trends In Distributed Systems MSc Software and Systems

DISTRIBUTED SYSTEMS AND CLOUD COMPUTING. A Comparative Study

A Framework for Performance Analysis and Tuning in Hadoop Based Clusters

Chapter 7. Using Hadoop Cluster and MapReduce

Big Data: A Storage Systems Perspective Muthukumar Murugan Ph.D. HP Storage Division

A Novel Approach for Efficient Load Balancing in Cloud Computing Environment by Using Partitioning

Big Data Management in the Clouds. Alexandru Costan IRISA / INSA Rennes (KerData team)

International Journal of Advancements in Research & Technology, Volume 3, Issue 2, February ISSN

Exploiting Cloud Heterogeneity for Optimized Cost/Performance MapReduce Processing

Data-Intensive Programming. Timo Aaltonen Department of Pervasive Computing

IMPROVEMENT OF RESPONSE TIME OF LOAD BALANCING ALGORITHM IN CLOUD ENVIROMENT

Introduction to Apache YARN Schedulers & Queues

Dynamic Resource allocation in Cloud

Hadoop Cluster Applications

Lecture 02a Cloud Computing I

The Improved Job Scheduling Algorithm of Hadoop Platform

Cloud Computing- Research Issues and Challenges

CHAPTER 8 CLOUD COMPUTING

A Locality Enhanced Scheduling Method for Multiple MapReduce Jobs In a Workflow Application

Object Storage: A Growing Opportunity for Service Providers. White Paper. Prepared for: 2012 Neovise, LLC. All Rights Reserved.

Apache Hama Design Document v0.6

Transcription:

1 / 40 Modeling and Optimization of Resource Allocation in Cloud PhD Thesis Proposal Atakan Aral Thesis Advisor: Asst. Prof. Dr. Tolga Ovatman Istanbul Technical University Department of Computer Engineering June 25, 2014

2 / 40 Outline 1 Introduction 2 3 Resource Selection and Optimization Work Distribution to Resources 4 5

4 / 40 Cloud Computing Definition Applications and services that run on a distributed network using virtualized resources and accessed by common Internet protocols and networking standards. Broad network access: Platform-independent, via standard methods Measured service: Pay-per-use, e.g. amount of storage/processing power, number of transactions, bandwidth etc. On-demand self-service: No need to contact provider to provision resources Rapid elasticity: Automatic scale up/out, illusion of infinite resources Resource pooling: Abstraction, virtualization, multi-tenancy

5 / 40 Cloud Computing Roots Cloud computing paradigm is revolutionary, however the technology it is built on is only evolutionary.

6 / 40 Cloud Computing Benefits Lower costs Ease of utilization Quality of Service Reliability Outsourced IT management Simplified maintenance and upgrade Lower barrier to entry

7 / 40 Cloud Computing Architecture

8 / 40 MapReduce Definition A programming model for processing large data sets with a parallel, distributed algorithm on a cluster.

9 / 40 MapReduce Entities DataNode: Stores blocks of data in distributed filesystem (HDFS). NameNode: Holds metadata (i.e. location information) of the files in HDFS. Jobtracker: Coordinates the MapReduce job. Tasktracker: Runs the tasks that the MapReduce job split into.

11 / 40 Life Cycle of Cloud Software Development of Distributed Software MapReduce Configuration Resource Allocation Resource Selection and Optimization Load Balancing Work Distribution to Resources

12 / 40 Motivation The cost of using 1000 machines for 1 hour, is the same as using 1 machine for 1000 hours in the cloud paradigm (Cost associativity). Optimum number of maps and reduces that maximize resource utilization are dependent on the resource consumption profile of the cloud software. Bottleneck resources should be identified. Optimization at Application level

13 / 40 Resource Selection and Optimization Motivation In distributed computing environments, up to 85 percent of computing capacity remains idle mainly due to poor optimization of placement. Better assignment of virtual nodes to physical nodes may result in more efficient use of resources. There are several possible constraints to consider / optimize e.g. capacity limits, proximity to user, latency etc. Optimization at Infrastructure level

14 / 40 Work Distribution to Resources Motivation Data flows between nodes only in the shuffle step of the MapReduce job, and tuning it can have a big impact on job execution time. Mapper and reducer nodes should be selected carefully to minimize network traffic. Dynamic load balancing should be ensured. Optimization at Platform level

17 / 40 Introduction Resource Selection and Optimization Work Distribution to Resources Problem Aim Maximizing the utilization of all nodes for a Hadoop job by calculating the optimum parameters i.e. number of mappers and reducers Higher values mean higher parallelism but may cause resource contention and coordination problems. Optimum parameters depend on the resource consumption of the software.

18 / 40 Introduction Resource Selection and Optimization Work Distribution to Resources Previous Solutions Kambatla, K., Pathak, A., and Pucha, H. (2009). Towards Optimizing Hadoop Provisioning in the Cloud. In Proceedings of the 1st USENIX Workshop on Hot Topics in Cloud Computing (HotCloud), 118-122. Calculates the optimum parameters (number of M/R) for the Hadoop job such that each resource set is fully utilized. Each application has a different bottleneck resource and utilization. Requires to run a small chunk of the application to create a signature Split job into n intervals of same duration. Calculate the average consumption of all 3 resources (CPU, Disk, and Network) in each interval. Uses the configuration of the most similar signature in the database.

19 / 40 Introduction Resource Selection and Optimization Work Distribution to Resources Previous Solutions

20 / 40 Introduction Resource Selection and Optimization Work Distribution to Resources Suggested Solution Design model of the software will be statically analyzed in order to guess resource consumption pattern. Critical (bottleneck) resources will be identified. If required, source code or input data may also be included to the analysis.

21 / 40 Introduction Resource Selection and Optimization Work Distribution to Resources Output An algorithm that calculates optimum Hadoop configuration for a given software An API that receives software model and outputs a Hadoop configuration suggestion

23 / 40 Introduction Resource Selection and Optimization Work Distribution to Resources Problem Aim Optimally assigning interconnected virtual nodes to substrate network with constraints Constraints: Datacenter capacities and bandwidth Virtual topology requests and incompletely known cloud topology Locality and jurisdiction Application interaction and scalability rules Objectives (Minimization): Inter-DC communication Geographical proximity to user and latency

Resource Selection and Optimization Work Distribution to Resources Previous Solutions Papagianni, C., Leivadeas, A., Papavassiliou, S., Maglaris, V., Cervello-Pastor, C., and Monje, A. (2013). On the Optimal Allocation of Virtual Resources in Cloud Computing Networks. IEEE Transactions on Computers (TC), 62(6):1060-1071. Mapping user requests for virtual resources onto shared substrate resources Problem is solved in two coordinated phases: node and link mapping. In node mapping, the objective is to minimize the cost of mapping and the method is the random relaxation of MIP to LP. In link mapping, the objective is to minimize the number of hops and the method is shortest path or minimum cost flow algorithms. Suggested algorithm is compared with greedy heuristics in terms of acceptance ratio and number of hops. 24 / 40

25 / 40 Introduction Resource Selection and Optimization Work Distribution to Resources Previous Solutions Larumbe, F. and Sanso, B. (2013). A Tabu Search Algorithm for the Location of Data Centers and Software Components in Green Cloud Computing Networks. IEEE Transactions on Cloud Computing (TCC), 1(1):22-35. Which component of the software should be hosted at which datacenter? Minimize delay, cost, energy consumption, and CO 2 emission. Greedy initial solution is improved by moving one component at each step. A random subset of neighbours are analyzed for the best improvement. Suggested tabu search heuristic is compared with MIP formulation and greedy heuristic in terms of execution time and optimality. Tradeoff analysis between the multiple objectives

26 / 40 Introduction Resource Selection and Optimization Work Distribution to Resources Previous Solutions Agarwal, S., Dunagan, J., Jain, N., Saroiu, S., Wolman, A., and Bhogan, H. (2011). Volley: Automated Data Placement for Geo-Distributed Cloud Services. In Proceedings of the 7th USENIX Symposium on Networked Systems Design and Implementation (NSDI), 17-32. Volley analyzes trace data and suggest migrations to improve DC capacity skew, inter-dc traffic and user latency. Considers client geographic diversity, client mobility and data dependency. Method contains of 3 sequential phases: 1 Compute initial placement using weighted spherical mean, 2 Iteratively move data to reduce latency, 3 Iteratively collapse data to datacenters. Compared against 3 simple heuristics: onedc, commonip and hash.

27 / 40 Introduction Resource Selection and Optimization Work Distribution to Resources Suggested Solution VN requests and DC network are represented as undirected weighted graphs. Nodes store computational capacities (i.e. CPU, memory) and storage. Edge weights represent bandwidth capacities. Graph similarity - subgraph matching algorithms will be employed. Suggested solution will be static.

28 / 40 Introduction Resource Selection and Optimization Work Distribution to Resources Output An algorithm that optimizes the resource provisioning An API that receives graph and constraint inputs and outputs a matching Aral, A., and Ovatman, T. (2014). Improving Resource Utilization in Cloud Environments using Application Placement Heuristics. In Proceedings of the 4th International Conference on Cloud Computing and Services Science (CLOSER), 527-534.

30 / 40 Introduction Resource Selection and Optimization Work Distribution to Resources Problem Aim Analysis of the shuffle and scheduling algorithms of Apache Hadoop framework. Initial selection of DataNodes, Mappers and Reducers is critical to reduce network traffic. Dynamic re-distribution of work during execution results in better load balancing and better performance.

Resource Selection and Optimization Work Distribution to Resources Previous Solutions FIFO scheduler: Integrated scheduling algorithm, manual prioritization Fair scheduler: Each job receives an equal share of resources over time, on average. Organizes jobs into user pools that are resourced fairly. Capacity scheduler: Allows sharing a large cluster while giving each user a minimum capacity guarantee. Priority queues are used instead of pools where excess capacity is reused. Hadoop On Demand: Provisions virtual clusters from within larger physical clusters. Adapts and scales when the workload changes. Learning scheduler: Chooses the task that is expected to provide max utility. Adaptive scheduler: User specifies a deadline at job submission time, and the scheduler adjusts the resources to meet that deadline. 31 / 40

32 / 40 Introduction Resource Selection and Optimization Work Distribution to Resources Suggested Solution Cost based analysis of the existing schedulers Which scheduler is more appropriate for given costs of map, reduce, shuffle phases? How can DataNode selection for blocks and clones be optimized to reduce network traffic?

33 / 40 Introduction Resource Selection and Optimization Work Distribution to Resources Output Formal modeling and analysis of Hadoop schedulers An API that receives costs of MapReduce phases and outputs a scheduler suggestion A patch for open-source Apache Hadoop framework

35 / 40 2014 2015 2016 1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11 12 Resource Selection Research Modeling Development Documentation Research Modeling Development Documentation Work Distribution Research Modeling Development Documentation Today

37 / 40 Summary Optimization on 3 related phases of the cloud software life cycle 1 Map Reduce Configuration 2 Resource Selection and Assignment 3 Work Distribution and Load Balancing Graph based modeling of the cloud environment and resource allocation problem Holistic approach that aims to increase the performance of cloud software (Map Reduce) by optimizing resource allocation

38 / 40 Unique Aspects Inclusion of cloud computing aspects to the RA problem Virtual topology requests and incompletely known network topology Locality and Jurisdiction Scalability rules Inclusion of software characteristics (i.e. design model) to the RA problem Optimization with different objectives at each level

39 / 40 Impact Our goal is to contribute to the long-held goal of utility computing. Cloud providers will benefit since they will be able to use their resources more efficiently and will be able to serve more customers without violating the service level agreements. Cloud users will also benefit in terms of shorter application execution time and less infrastructure requirement (lower cost).

40 / 40 Thank you for your time.

Appendix I: Cloud Computing Architecture 1 / 2

2 / 2 Appendix II: Resource Allocation Problem in Cloud Conceptual phase 1 Resource modeling: Notations only exist for concrete resources. Different levels of granularity can be chosen. Vertical and horizontal heterogeneity cause interoperability problems. 2 Resource offering and treatment: Independent from modeling. New requirements are present in addition to common network and computation ones. Operational phase 3 Resource discovery and monitoring: Finding suitable candidate resources based on proximity or impact on network. Passive or active monitoring strategies exist. 4 Resource selection and optimization: Finding a configuration that fulfills all requirements and optimizes infrastructure usage. Dynamicity, algorithm complexity, and large number of constraints are the main problems.