An Adaptive Utilization Accelerator for Virtualized Environments LCCC Workshop in Cloud Control

Similar documents
SLA-aware Resource Over-Commit in an IaaS Cloud

Capacity Management in IaaS Cloud

A Distributed Approach to Dynamic VM Management

Advanced Load Balancing Mechanism on Mixed Batch and Transactional Workloads

Heterogeneous Workload Consolidation for Efficient Management of Data Centers in Cloud Computing

Energy Constrained Resource Scheduling for Cloud Environment

Dynamic Resource allocation in Cloud

Characterizing Task Usage Shapes in Google s Compute Clusters

This is an author-deposited version published in : Eprints ID : 12902

Towards an understanding of oversubscription in cloud

An Analysis of First Fit Heuristics for the Virtual Machine Relocation Problem

Energy-Aware Multi-agent Server Consolidation in Federated Clouds

Task Scheduling for Efficient Resource Utilization in Cloud

Phoenix Cloud: Consolidating Different Computing Loads on Shared Cluster System for Large Organization

ENERGY EFFICIENT VIRTUAL MACHINE ASSIGNMENT BASED ON ENERGY CONSUMPTION AND RESOURCE UTILIZATION IN CLOUD NETWORK

Affinity Aware VM Colocation Mechanism for Cloud

Dynamic resource management for energy saving in the cloud computing environment

Workload Analysis and Demand Prediction for the HP eprint Service Vipul Garg, Ludmila Cherkasova*, Swaminathan Packirisami, Jerome Rolia*

Takahiro Hirofuchi, Hidemoto Nakada, Satoshi Itoh, and Satoshi Sekiguchi

International Journal of Scientific & Engineering Research, Volume 4, Issue 5, May ISSN

1. Simulation of load balancing in a cloud computing environment using OMNET

ENERGY EFFICIENT CONTROL OF VIRTUAL MACHINE CONSOLIDATION UNDER UNCERTAIN INPUT PARAMETERS FOR THE CLOUD

Efficient Virtual Machine Sizing For Hosting Containers as a Service

Green Cloud Computing 班 級 : 資 管 碩 一 組 員 : 黃 宗 緯 朱 雅 甜

Optimized Resource Provisioning based on SLAs in Cloud Infrastructures

Figure 1. The cloud scales: Amazon EC2 growth [2].

Cloud/SaaS enablement of existing applications

Cloud Management: Knowing is Half The Battle

Comparison of Request Admission Based Performance Isolation Approaches in Multi-tenant SaaS Applications

Power and Performance Modeling in a Virtualized Server System

Best Practices for Monitoring Databases on VMware. Dean Richards Senior DBA, Confio Software

Payment minimization and Error-tolerant Resource Allocation for Cloud System Using equally spread current execution load

supported Application QoS in Shared Resource Pools

Virtualization Technology using Virtual Machines for Cloud Computing

Dynamic Resource Allocation in Software Defined and Virtual Networks: A Comparative Analysis

Dynamic Management of Applications with Constraints in Virtualized Data Centres

Task Scheduling Techniques for Minimizing Energy Consumption and Response Time in Cloud Computing

Capacity Planning and Power Management to Exploit Sustainable Energy

A Survey of Energy Efficient Data Centres in a Cloud Computing Environment

Effective VM Sizing in Virtualized Data Centers

Dynamic Load Balancing of Virtual Machines using QEMU-KVM

Managing Capacity Using VMware vcenter CapacityIQ TECHNICAL WHITE PAPER

Ecole des Mines de Nantes. Journée Thématique Emergente "aspects énergétiques du calcul"

Power Aware Live Migration for Data Centers in Cloud using Dynamic Threshold

ENERGY EFFICIENT AND INTERFERENCE AWARE PROVISIONING IN VIRTUALIZED SERVER CLUSTER ENVIRONMENT

Precise VM Placement Algorithm Supported by Data Analytic Service

Setting deadlines and priorities to the tasks to improve energy efficiency in cloud computing

Reference Architecture and Best Practices for Virtualizing Hadoop Workloads Justin Murray VMware

International Journal of Advance Research in Computer Science and Management Studies

Towards an Improved Data Centre Simulation with DCSim

INCREASING SERVER UTILIZATION AND ACHIEVING GREEN COMPUTING IN CLOUD

Hybrid resource provisioning for clouds

Falloc: Fair Network Bandwidth Allocation in IaaS Datacenters via a Bargaining Game Approach

Satisfying Service Level Objectives in a Self-Managing Resource Pool

SLA-Driven Simulation of Multi-Tenant Scalable Cloud-Distributed Enterprise Information Systems

Design and Implementation of IaaS platform based on tool migration Wei Ding

MAPREDUCE [1] is proposed by Google in 2004 and

IBM Spectrum Protect in the Cloud

Kronos Workforce Central on VMware Virtual Infrastructure

Ensuring Collective Availability in Volatile Resource Pools via Forecasting

RESOURCE ACCESS MANAGEMENT FOR A UTILITY HOSTING ENTERPRISE APPLICATIONS

Alfresco Enterprise on Azure: Reference Architecture. September 2014

Comparison of Windows IaaS Environments

A Dynamic Resource Management with Energy Saving Mechanism for Supporting Cloud Computing

Microsoft Office SharePoint Server 2007 Performance on VMware vsphere 4.1

Infrastructure as a Service (IaaS)

AN ADAPTIVE DISTRIBUTED LOAD BALANCING TECHNIQUE FOR CLOUD COMPUTING

USING VIRTUAL MACHINE REPLICATION FOR DYNAMIC CONFIGURATION OF MULTI-TIER INTERNET SERVICES

OCCI and Security Operations in OpenStack - Overview

Utilization-based Scheduling in OpenStack* Compute (Nova)

Table of Contents. P a g e 2

Keywords Distributed Computing, On Demand Resources, Cloud Computing, Virtualization, Server Consolidation, Load Balancing

Run-time Resource Management in SOA Virtualized Environments. Danilo Ardagna, Raffaela Mirandola, Marco Trubian, Li Zhang

Monitoring Databases on VMware

Newsletter 4/2013 Oktober

Elastic VM for Rapid and Optimum Virtualized

Power Consumption Based Cloud Scheduler

POSIX and Object Distributed Storage Systems

A Batch Computing Service for the Spot Market. Supreeth Subramanya, Tian Guo, Prateek Sharma, David Irwin, Prashant Shenoy

1000 islands: an integrated approach to resource management for virtualized data centers

HBC How to build your cloud - Steps to Extend your Datacenter

How swift is your Swift? Ning Zhang, OpenStack Engineer at Zmanda Chander Kant, CEO at Zmanda

An Experimental Study of Load Balancing of OpenNebula Open-Source Cloud Computing Platform

WHITE PAPER. SQL Server License Reduction with PernixData FVP Software

GUEST OPERATING SYSTEM BASED PERFORMANCE COMPARISON OF VMWARE AND XEN HYPERVISOR

Oracle Hyperion Financial Management Virtualization Whitepaper

How To Manage Cloud Service Provisioning And Maintenance

Postprint.

Energy Efficient Resource Management in Virtualized Cloud Data Centers

Exploiting Cloud Heterogeneity to Optimize Performance and Cost of MapReduce Processing

Experimental Evaluation of Energy Savings of Virtual Machines in the Implementation of Cloud Computing

Characterize Performance in Horizon 6

A dynamic optimization model for power and performance management of virtualized clusters

Avoid Paying The Virtualization Tax: Deploying Virtualized BI 4.0 The Right Way. Ashish C. Morzaria, SAP

Non-intrusive Slot Layering in Hadoop

PERFORMANCE ANALYSIS OF PaaS CLOUD COMPUTING SYSTEM

Migration of Virtual Machines for Better Performance in Cloud Computing Environment

3-D Cloud Monitoring: Enabling Effective Cloud Infrastructure and Application Management

3. RELATED WORKS 2. STATE OF THE ART CLOUD TECHNOLOGY

Transcription:

An Adaptive Utilization Accelerator for Virtualized Environments LCCC Workshop in Cloud Control David Breitgand, Zvi Dubitzky, Amir Epstein, Oshrit Feder, Alex Glikson, Inbar Shapira and Giovanni Toffetti Cloud Operating Systems Technologies IBM Haifa Research Lab

What is the problem? VM sprawl in private clouds VM sprawl Proliferation of inactive / unused VMs in clouds Stems from cloud provisioning model (and relative lack of control) In the absence of resource utilization models, VM provisioning is based on nominal resource demand Looking at a VM: 2 VCPUs 4 GB RAM Nominal resource demand Looking at a DC:!= Resource utilization 100% My private cloud 2 Fully utilized nominal capacity Low Infrastructure utilization VM provisioning based on nominal resource demand Unsatisfied VM placement demand

Common solution: resource over-commit PCPU? VCPU VCPU VCPU VCPU VCPU VCPU VCPU Over-commit Ratio (OCR) = #VCPUs / #PCPUs OCR is a cloud-wide configuration parameter Default CPU OCR for Openstack: 16(!) OCR => infrastructure utilization OCR => risk of congestion performance degradation 3 [OpenStack Operations Guide, 2014-03-06]

Our proposal Adaptive over-commit RATIONALE: There is NO right fixed OCR VMs activity vs. idleness are application-dependent and vary over time Need for automated solution GOALS/FEATURES: Increase DC utilization Minimize performance degradation Transparent to VM tenants No assumptions/forecast on VM resource consumption 4

Pulsar high-level functioning Provision VM VM Active IBM Adaptive Utilization Accelerator for Virtualized Environments (PULSAR): Simple VM idleness detector (CPU util threshold) Claim resources from idle VMs by throttling them (reducing their resource reservation, cgroups in KVM) Use adjusted capacity (considering throttling) to provision and place more VMs in the system yes VM idle? VM Throttled VM idle? Unthrottle in-place no no Success? Migrate to other host no yes yes Success? no Stop VM admission Record violation yes 5

Implementation Full implementation as extension to OpenStack Havana release (soon IceHouse) Idleness detector running on each host Adjusted capacity filter Dynamic resource manager Admission control mechanism 6

Pulsar evaluation Experiments Smaller testbed Full implementation (Synthetic) trace-driven workload: boulders and sand model Measure performance degradation Simulations Large testbeds Nova scheduler + testbed emulator (pymoc) Synthetic and real datacenter trace-driven workload Estimate performance degradation through host congestion Compare with theoretical upper-bound oracle scheduler 7

Synthetic workload simulation Based on OpenStack code Medium-size scenario 100 hosts, 24 PCPUs each (2400 total cores) 1 week synthetic workload using boulders and sand model Boulders: long-living VMs with periodic demand pattern Sand: short-lived VMs CPUintensive jobs (dev-test, mapreduce) Markov-chain demand model Compare with fixed OCR (1,2.5,3) and Oracle (theoretical upper bound) 8

Trace-based simulation Pseudo-random sample from Google cluster traces [Wilkes 2011] 800 hosts, 48 cores each [38400 cores] Pulsar: +20% admitted VMs -50% congestion Limits of: Reactive admission control No VM preemption / prorities 10 runs avg. Fixed OCR 1.5 PULSAR Total admitted VMs 455k 548k Avg. Congestion 7.27% 3.12% 9

Experimental evaluation Trace-driven experiment (trace generated with boulder/sand model) Daytrader (DT) Web app [3VCPU, 30min period] Sudoku solver (SD) [1VCPU, 90min avg lifetime, 5% probability of switching btw idle/active each minute] Very active workload, very low maximum achievable OCR [1.5 max from Oracle] Testbed Openstack Controller node [Supermicro 8 Xeon E5420 2.5 GHz cores, 8GB RAM] 2 Openstack Compute nodes [IBM System X3550 M3, 24 Xeon X5680 3.3 GHz cores, 28GB RAM] Runs (averaged over 20 executions): R1: 4 DTs + 36 SDs (group A), OCR=1 R2: PULSAR with group A + SDs from a Poisson process with 2 minutes inter-arrival time (group B) R3: fixed OCR=1.27 (average obtained by Pulsar) groups A+B Run DT avg RT (STD) [ms] SD avg thr (STD) [Hz] Host 1 avg util [%] Host 2 avg util [%] R1 28.8 (6.4) 61.2 (13) 57.8 62.3 R2 34.6 (9.7) 50.25 (14.1) 79.1 80.4 R3 41.6 (11.8) 46.95 (12.85) 85 84 10

Conclusion From our evaluation: PULSAR is adaptive to changes in resource utilization It increases infrastructure utilization Limited host congestion Limited number of VM migrations Outperforming any fixed-ocr solution Future work: Use improved idleness detector / load predictors Proactive admission control / VM priorities - preemption Larger experiments! Questions? 11

Backup slides 12

Related work Many papers on demand prediction for stable VM population [Breitgand 2012] [Chen 2011] [Meng 2010] [Gmach 2007] We consider dynamic VM population, discrepancy between nominal and actual resource usage, adaptive over-commit [Gmach 2012] [Yanagisawa 2013] assume static VM population and no overcommit model, use past VM demand patterns to predict future demand We left out prediction of future demand on purpose assuming dynamic VM population, albeit we could leverage this information [Carrera 2012] aim at fair placement decision by using a model of expected performance given a resource allocation for each workload [Blagodurov 2013] requires application performance monitoring instrumentation and knowledge of resource consumption profiles to classify applications as batch or interactive [Wuhib 2012] use average resource utilization over a sliding window to implement different placement policies (e.g., consolidation). The same solution can be applied for adaptive overcommit. However churn and high utilization variation cause number of required migrations to grow quickly 13

References [Wilkes 2011]: John Wilkes, More Google cluster data, Google research blog, Nov 2011 [Breitgand 2012] D. Breitgand and A. Epstein, Improving Consolidation of Virtual Machines with Risk- Aware Bandwidth Oversubscription in Compute Clouds, in INFOCOM, 2012. [Chen 2011] M. Chen, H. Zhang, Y.-Y. Su, X. Wang, G. Jiang, and K. Yoshihira, Effective VM Sizing in Virtualized Data Centers, in IEEE/IFIP IM 11, Dublin, Ireland, May 2011. [Meng 2010] X. Meng, C. Isci, J. Kephart, L. Zhang, E. Bouillet, and D. Pendarakis, Efficient Resource Provisioning in Compute Clouds via VM Multiplexing, in The 7th IEEE/ACM International Conference on Autonomic Computing and Communications, Washington, DC, USA, Jun 2010. [Gmach 2007] D. Gmach, J. Rolia, L. Cherkasova, and A. Kemper, Workload Analysis and Demand Prediction of Enterprise Data Center Applications, 2007 IEEE 10th International Symposium on Workload Characterization, pp. 171 180, Sep. 2007. [Gmach 2012] D. Gmach, J. Rolia, and L. Cherkasova, Selling T-shirts and Time Shares in the Cloud, Cloud and Grid Computing, 2012. [Yanagisawa 2013] H. Yanagisawa, T. Osogami, and R. Raymond, Dependable Virtual Machine Allocation, in Infocom, 2013, pp. 653 661. [Blagodurov 2013] S. Blagodurov, D. Gmach, M. Arlitt, Y. Chen, C. Hyser, and A. Fedorova, Maximizing Server Utilization while Meeting Critical SLAs via Weight-Based Collocation Management, in IFIP/IEEE IM 13, 2013. [Wuhib 2012] F. Wuhib, R. Stadler, and H. Lindgren, Dynamic resource allocation with management objectives: Implementation for an OpenStack cloud, in CNSM 12, 2012, pp. 309 315. 14