A View of Cloud Computing: Concepts and Challenges

Similar documents

Experiments on cost/power and failure aware scheduling for clouds and grids

Keywords Distributed Computing, On Demand Resources, Cloud Computing, Virtualization, Server Consolidation, Load Balancing

Dynamic Power- and Failure-Aware Cloud Resources Allocation for Sets of Independent Tasks

Black-box and Gray-box Strategies for Virtual Machine Migration

Enhancing the Scalability of Virtual Machines in Cloud

Virtualization Technology using Virtual Machines for Cloud Computing

INCREASING SERVER UTILIZATION AND ACHIEVING GREEN COMPUTING IN CLOUD

A Taxonomy and Survey of Energy-Efficient Data Centers and Cloud Computing Systems

Evaluation Methodology of Converged Cloud Environments

Environments, Services and Network Management for Green Clouds

International Journal of Advance Research in Computer Science and Management Studies

Effective Resource Allocation For Dynamic Workload In Virtual Machines Using Cloud Computing

Keywords: Cloudsim, MIPS, Gridlet, Virtual machine, Data center, Simulation, SaaS, PaaS, IaaS, VM. Introduction

Heterogeneous Workload Consolidation for Efficient Management of Data Centers in Cloud Computing

An Overview on Important Aspects of Cloud Computing

Table of Contents. Abstract... Error! Bookmark not defined. Chapter 1... Error! Bookmark not defined. 1. Introduction... Error! Bookmark not defined.

Datacenters and Cloud Computing. Jia Rao Assistant Professor in CS

solution brief September 2011 Can You Effectively Plan For The Migration And Management of Systems And Applications on Vblock Platforms?

Emerging Technology for the Next Decade

This is an author-deposited version published in : Eprints ID : 12902

CHAPTER 1 INTRODUCTION

Optimal Service Pricing for a Cloud Cache

Cluster, Grid, Cloud Concepts

Energy Constrained Resource Scheduling for Cloud Environment

Comparison of Request Admission Based Performance Isolation Approaches in Multi-tenant SaaS Applications

OVERVIEW Cloud Deployment Services

A Middleware Strategy to Survive Compute Peak Loads in Cloud

Introduction to Cloud Computing

Infrastructure as a Service (IaaS)

can you effectively plan for the migration and management of systems and applications on Vblock Platforms?

Building Platform as a Service for Scientific Applications

ABSTRACT. KEYWORDS: Cloud Computing, Load Balancing, Scheduling Algorithms, FCFS, Group-Based Scheduling Algorithm

Network Infrastructure Services CS848 Project

Auto-Scaling Model for Cloud Computing System

Multifaceted Resource Management for Dealing with Heterogeneous Workloads in Virtualized Data Centers

Towards a Resource Elasticity Benchmark for Cloud Environments. Presented By: Aleksey Charapko, Priyanka D H, Kevin Harper, Vivek Madesi

Exploring Resource Provisioning Cost Models in Cloud Computing

<Insert Picture Here> Enterprise Cloud Computing: What, Why and How

Dynamic Resource management with VM layer and Resource prediction algorithms in Cloud Architecture

Cloud Based Distributed Databases: The Future Ahead

AN IMPLEMENTATION OF E- LEARNING SYSTEM IN PRIVATE CLOUD

INTRODUCTION TO CLOUD COMPUTING CEN483 PARALLEL AND DISTRIBUTED SYSTEMS

A Survey on Load Balancing and Scheduling in Cloud Computing

Kronos Workforce Central on VMware Virtual Infrastructure

CHAPTER 8 CLOUD COMPUTING

Virtualizing Apache Hadoop. June, 2012

Cloud computing. Intelligent Services for Energy-Efficient Design and Life Cycle Simulation. as used by the ISES project

System Models for Distributed and Cloud Computing

ENERGY EFFICIENT VIRTUAL MACHINE ASSIGNMENT BASED ON ENERGY CONSUMPTION AND RESOURCE UTILIZATION IN CLOUD NETWORK

Oracle: Private Platform as a Service from Oracle

International Journal of Digital Application & Contemporary research Website: (Volume 2, Issue 9, April 2014)

Effective Virtual Machine Scheduling in Cloud Computing

Cloud Computing: Computing as a Service. Prof. Daivashala Deshmukh Maharashtra Institute of Technology, Aurangabad

Power Management in Cloud Computing using Green Algorithm. -Kushal Mehta COP 6087 University of Central Florida

Energy-Aware Multi-agent Server Consolidation in Federated Clouds

Introduction to Cloud Computing

How To Understand Cloud Computing

Li Sheng. Nowadays, with the booming development of network-based computing, more and more

Sistemi Operativi e Reti. Cloud Computing

Power Aware Live Migration for Data Centers in Cloud using Dynamic Threshold

Efficient and Enhanced Load Balancing Algorithms in Cloud Computing

Permanent Link:

OIT Cloud Strategy 2011 Enabling Technology Solutions Efficiently, Effectively, and Elegantly

Efficient Resources Allocation and Reduce Energy Using Virtual Machines for Cloud Environment

Ecole des Mines de Nantes. Journée Thématique Emergente "aspects énergétiques du calcul"

RESOURCE MANAGEMENT IN CLOUD COMPUTING ENVIRONMENT

Performance Management for Cloudbased STC 2012

Proactive, Resource-Aware, Tunable Real-time Fault-tolerant Middleware

White Paper on CLOUD COMPUTING

Group Based Load Balancing Algorithm in Cloud Computing Virtualization

Cloud and Virtualization to Support Grid Infrastructures

Multilevel Communication Aware Approach for Load Balancing

Write a technical report Present your results Write a workshop/conference paper (optional) Could be a real system, simulation and/or theoretical

Oracle Platform as a Service (PaaS) FAQ

Cloud Infrastructure Foundation. Building a Flexible, Reliable and Automated Cloud with a Unified Computing Fabric from Egenera

How Microsoft Designs its Cloud-Scale Servers

AN EFFICIENT LOAD BALANCING APPROACH IN CLOUD SERVER USING ANT COLONY OPTIMIZATION

Performance Gathering and Implementing Portability on Cloud Storage Data

Building Out Your Cloud-Ready Solutions. Clark D. Richey, Jr., Principal Technologist, DoD

VMware for your hosting services

Allocation of Datacenter Resources Based on Demands Using Virtualization Technology in Cloud

Adaptive Scheduling for QoS-based Virtual Machine Management in Cloud Computing

Payment minimization and Error-tolerant Resource Allocation for Cloud System Using equally spread current execution load

Provisioning and Resource Allocation for Green Clouds

CHAPTER 7 SUMMARY AND CONCLUSION

How to Do/Evaluate Cloud Computing Research. Young Choon Lee

Dynamic Resource Pricing on Federated Clouds

Chapter 19 Cloud Computing for Multimedia Services

Planning the Migration of Enterprise Applications to the Cloud

Keywords Cloud computing, virtual machines, migration approach, deployment modeling

How To Understand Cloud Computing

A Study on Analysis and Implementation of a Cloud Computing Framework for Multimedia Convergence Services

VM Management for Green Data Centres with the OpenNebula Virtual Infrastructure Engine

How To Understand Cloud Computing

CUMULUX WHICH CLOUD PLATFORM IS RIGHT FOR YOU? COMPARING CLOUD PLATFORMS. Review Business and Technology Series

International Journal of Engineering Research & Management Technology

White Paper. Cloud Native Advantage: Multi-Tenant, Shared Container PaaS. Version 1.1 (June 19, 2012)

Last time. Data Center as a Computer. Today. Data Center Construction (and management)

Real Time Network Server Monitoring using Smartphone with Dynamic Load Balancing

Transcription:

A View of Cloud Computing: Concepts and Challenges Jorge G. Barbosa Universidade do Porto, Faculdade de Engenharia, LIACC Porto, Portugal jbarbosa@fe.up.pt FEUP, 2013 Outline Part I: Basic Concepts Introduction and Principals Overview Part II: Challenges Fault Tolerance Energy optimization Quality of Service (QoS) Part III: Current Research 2 1

3 What is Cloud Computing? Cloud Computing refers to both the applications delivered as services over the Internet and the hardware and systems software in the datacenters that provide those services. Fox, Armando, et al. "Above the clouds: A Berkeley view of cloud computing." Dept. Electrical Eng. and Comput. Sciences, University of California, Berkeley, Rep. UCB/EECS 28 (2009). A large-scale distributed computing paradigm ( ) in which a pool of abstracted, virtualized, dynamically-scalable, managed computing power, storage, platforms, and services are delivered on demand( ) over the Internet. Foster, Ian, et al. "Cloud computing and grid computing 360-degree compared." Grid Computing Environments Workshop, GCE'08. Ieee, 2008. 4 2

Clouds Cloud Computing Image source: The Future of Cloud Computing, available at http://cordis.europa.eu/fp7/ict/ssai/docs/cloud-report-final.pdf 6 3

TYPES SaaS (Software as a Service) What On-demand access to any application Who End-user(consume) PaaS (Platform as a Service) IaaS (Infrastructure as a Service) Platform upon which apps/services can be developed and hosted Access tocomputacional resources, i.e. CPU, RAM, Data & Storage Developer(build) Hosts provider(host) 7 MODES Usually owned by an institution; functionalities not directly exposed to the consumer(ex.: ebay) Mixed employment of private and public infrastructures, so as to reduce costs by sharing, but with desired degree of control Image source: http://www.iland.com Owner offer their services to users outside of the institution (ex.: Amazon, Google Apps) 8 4

FEATURES Elasticity Leveraged by self-* provides agility and adaptability to environment changes Implies horizontal and vertical scalabilities Reliability and Availability Ensures constant operation through redundant resource usage (ex.: fault tolerance) Ability to deal with increasing concurrent access (ex.: loadbalancing) 9 BENEFITS Quality of Service Support and maintenance of specified users requirements to be met by the services and/or resources (ex.: response time) Pay per use Services sold as Utility Computing, costs according to the actual consumption of resources Going Green Reduce additional costs of energy consumption, but also to reduce the carbon footprint 10 5

Virtualization Technology in Clouds Virtualization is an essential technology in the Cloud Provides all the cloud features (e.g. ease of use, flexibility and adaptability, location independence, etc.) Image source: http:// http://blog.cloudpassage.com 11 12 6

Hot Topics in Cloud Research Fault tolerance Business continuity and service availability Energy efficiency Optimize energy consumption (ex.: maximize Mflop/ Joule) Green cloud computing -minimize operational costs but also reduce the environmental impact Quality of Service Performance unpredictability (ex.: due to sharing of resources among co-located s) 13 Hot Topics in Cloud Research Security Data security Interoperability How different clouds cooperate? Normalization How to guarantee that a user can change the cloud provider? Autonomic Computing 14 7

Fault Tolerance Dependability of the infrastructure Distributed systems are growing in scale and in complexity Mean Time Between Failures (MTBF) would be 1.25hon a petaflopsystem (1) (1) Fu, S. "Failure-aware resource management for high-availability computing clusters with distributed virtual machines." Journal of Parallel and Distributed Computing 70.4 (2010): 384-393. 15 Fault Tolerance Proactive fault tolerance Intelligent performance monitoring interface (IPMI) for health inquires (migration starts for threshold violations) Ganglia to determine node targets based on load averages In proactive FT systems, processes automatically migrate from unhealthy nodes to healthy ones. In reactive schemes, recovery occurs in response to already occurred failures. Overall architecture Nagarajan, A., et al. "Proactive fault tolerance for HPC with Xenvirtualization." Proc. of the 21st annual international conference on Supercomputing. ACM, 2007. 16 8

Fault Tolerance Dynamic allocation of s, considering PMs reliability Based in a failure predictor tool with 75% of average accuracy (1) Optimistic Best-Fit (OBFIT) algorithm -Selects the PM with minimum weighted available capacity and reliability (1) Pessimistic Best-Fit (PBFIT) algorithm -Calculates average capacity C avg from reliable PMs -Selects the unreliable PM pwith capacity C p such that C avg + C p results in the minimum necessary capacity Proposed architecture for reconfigurable distributed (1) Fu, S. "Failure-aware resource management for high-availability computing clusters with distributed virtual machines." Journal of Parallel and Distributed Computing 70.4 (2010): 384-393. 17 Fault Tolerance Dynamic allocation of s, considering PMs reliability System productivity is enhanced by using proposed strategies Task completion rate reaches 91.7% with 83.6% utilization of relatively unreliable nodes Percentage of completed jobs Percentage of completed tasks 18 9

Hot Topics in Cloud Research Fault tolerance Business continuity and service availability Energy efficiency Optimize energy consumption (ex.: maximize Mflop/ Joule) Green cloud computing -minimize operational costs but also reduce the environmental impact Quality of Service Performance unpredictability (ex.: due to sharing of resources among co-located s) 19 Energy Efficiency Energy consumption concern An average datacenter consumes as much energy as 25000 households (1) Main part of energy consumption determined by the CPU (2) Energy consumption dominates the operational costs (1) Kaplan, J., Forrest, W., Kindler, N., Revolutionizing Data Center Energy Efficiency, McKinsey& Company, Tech. Rep. (2) Berl, Andreas, et al. "Energy-efficient cloud computing." The Computer Journal 53.7 (2010): 1045-1051. 20 10

Energy Efficiency Consolidation Minimize the number of active nodes, and powering down inactive ones Dynamic Voltage Frequency Scaling (DVFS) Modern CPUs can run at different clock frequencies 21 Energy Efficiency - Examples Entropy system Minimize the number of active nodes, and powering down inactive ones, while maintaining the performance Find a configuration using the minimum numbern of nodes necessary to host all s Constraint programming allows Entropy to find mappings of tasks to nodes Reconfiguration loop Hermenier, F., et al. "Entropy: a consolidation manager for clusters." Proc. of the 2009 ACM SIGPLAN/SIGOPS international conference on Virtual execution environments. ACM, 2009. 22 11

Energy Efficiency - Examples Entropy system results Reduces consumption of cluster nodes per hour by over 50% as compared to static allocation Number of used physical machines Total execution time 23 Energy Efficiency - Examples DVFS-enabled clusters Algorithm minimizes the processor power dissipation by dynamically scaling down processor frequencies 1) Minimize the processor supply voltage by scaling down the processor frequency. 2) Schedule s to PEs with low voltages and try not to scale PE to high voltages. von Laszewski, G., et al. "Power-aware scheduling of virtual machines in dvfs-enabled clusters." Cluster Computing and Workshops, 2009. CLUSTER'09. IEEE International Conference on. IEEE, 2009. Working scenario 24 12

Energy Efficiency DVFS-enabled clusters results Applying DVFS technique to the compute nodes (PEs) reduces overall power consumption without degrading the s performance beyond unacceptable levels Performance impact of varying the number of s and operating frequency DVFS-enabled cluster scheduling simulation results 25 Hot Topics in Cloud Research Fault tolerance Business continuity and service availability Energy efficiency Optimize energy consumption (ex.: maximize Mflop/ Joule) Green cloud computing -minimize operational costs but also reduce the environmental impact Quality of Service Performance unpredictability (ex.: due to sharing of resources among co-located s) 26 13

Quality of Service - Examples Enforcing SLAs in scientific clouds Deadline-driven batch jobs Service Level Agreement (SLA) 1) Tests the feasibility of the SLA. 2) If accepted, guarantees its fulfillment. Approach is independent of the underlying cloud infrastructure and should deal with performance fluctuations The fuzzy control system Niehorster, O., et al. "Enforcing SLAs in scientific clouds." Cluster Computing (CLUSTER), 2010 IEEE International Conference on. IEEE, 2010. 27 Quality of Service - Examples Enforcing SLAs in scientific clouds Agents autonomouslyproof the feasibility of the SLA, and guarantee the fulfillment of the SLA meeting the deadline Agents successfully deal with noisein the cloud that occurs when s are co-located interference due to resource sharing (RAM, I/O, CPU) 28 14

Quality of Service - Examples Sandpiper system Hotspot detection algorithm, determines when to resize or migrate s Hotspot mitigation algorithm, determines what and where to migrate and how many resources to allocate Migrate the s in decreasing order of VSR VSR : volume-to-size ration (size = RAM footprint; volume = load) The Sandpiper architecture Wood, T., et al. "Sandpiper: Black-box and gray-box resource management for virtual machines." Computer Networks 53.17 (2009): 2923-2938. 29 Quality of Service Sandpiper system results Sandpiper can resize resources allocated to s Migrations occur if additional resources are not available A series of migrations resolve hotspots 30 15

31 Approach The goal Construct power- and failure-aware computing environments, in order to maximize the rate of completed jobs by their deadlines Pure Performance Higher Service Level Performance 32 16

Approach Construct power- and failure-aware computing environments, in order to maximize the rate of completed jobs by their deadlines It is a SLA based approach But SLA agreement should consider user compensations if the deadline is missed Virtual-to-physical resources mapping decisions consider both the power-efficiency, and reliability level of compute nodes Dynamic update of virtual-to-physical configurations (CPU usage and migration) 33 Approach Leverage virtualization tools Xen credit scheduler Dynamically update cap parameter CPU% 100 CPU Power consumption Increasing Stop & copy migration Faster migrations, preferable for proactive failure management 0 PM3 time PM2 PM1 Failure Stop & copy migration Failure prediction accuracy 34 17

System Overview Cloud architecture Private cloud Homogenous PMs Cluster coordinator manages user jobs s are created and destroyed dynamically Users jobs A jobis a set of independent tasks Private cloud management architecture A task runs in a single, which CPU-intensive workload is known Number of tasks per job and tasks deadlines are defined by users 35 System Overview Power model Capacity-reliability model Example of power efficiency curve (p1 = 175W, p2 = 75W) 36 18

Performance Analysis Minimum Time Task Execution (MTTE) algorithm Slack time to accomplish task t PM i capacity constraints Selects PM ithat: guarantees minimum processing power required by the increases power-efficiency has higher reliability But reserves maximum processing power 37 Performance Analysis Relaxed Time Task Execution (RTTE) algorithm 100% 0% Host CPU Cap set in Xen credit scheduler Unlike MTTE, the RTTE algorithm always reserves to the minimum amount of processing power necessary to accomplish the task within its deadline However, RTTE is work-conserving 38 19

Performance Analysis Implementation considerations Stabilization to avoid multiple migrations Algorithms compared to ours Common Best-Fit (CBFIT) Selects the PM with the maximum power-efficiency and do not consider resources reliability Optimistic Best-Fit (OBFIT) Pessimistic Best-Fit (PBFIT) 39 Performance Analysis Simulation setup 50 PMs, each modeled with one CPU core with the performance equivalent to 800 MFLOPS s require 128MB to 1024MB RAM s stop & copy migration overhead depends on RAM size 100 synthetic jobs, each being composed in average of 10 CPU-intensive workload tasks Failed PMs stay unavailable during a period modeled by a Lognormal distribution,and its mean time was set to20 minutes, varying up to 150 minutes. Tasks deadline are set to 10% more than their minimum execution time Failures instants follow a Weibull distribution, with shape parameter of 0.8 MTBF = 200 minutes 40 20

Performance Analysis Metrics Completion rate of users jobs Working-Efficiency Measures the quantity of useful work done(i.e. completed users jobs) by the consumed power 41 Performance Analysis A View of Cloud Computing : Concepts and Challenges 42 21

Performance Analysis Google Cloud tracelogs o o o o o o The medium length of a job is 3 minutes, and the majority of jobs run in less than 15 minutes, despite there are a number of jobs that run longer than 300 minutes Tasks length follow a Lognormal distribution CPU usage, varying from near 0% to around 25%, follow a Lognormal distribution 3614 synthetic jobs for a total of 10357tasks MTBF = 200 minutes Migrations occurring due to proactive failure management only A View of Cloud Computing : Concepts and Challenges 43 Performance Analysis A View of Cloud Computing : Concepts and Challenges 44 22

Energy Efficiency Improvement The goal Mechanism to detect energy optimization opportunities, and maintaining fault tolerance to the computing environment Find out the closest to optimum values to correctly tune the condition detection mechanism Dynamic update of virtual-to-physical configurations (CPU usage and migration) PM3 time PM2 PM1 Failure Stop & copy migration Failure prediction accuracy 45 Consolidation results Without consolidation With consolidation A View of Cloud Computing : Concepts and Challenges 46 23

Consolidation results Without consolidation With consolidation A View of Cloud Computing : Concepts and Challenges 47 Consolidation results A View of Cloud Computing : Concepts and Challenges 48 24

Conclusions Cloud computing opens new challenges Energy efficiency (more Mflop/Joule) Dynamic load balancing s interference modeling due to resource sharing (CPU, CACHE, I/O) CPU intensive and Data intensive jobs Data locality Scalability (distributed control) Autonomic Computing CERN Cloud infrastructure MScdissertation (MIEIC) to study and develop a resource management algorithm for CERN cloud 49 50 25