Towards energy-aware scheduling in data centers using machine learning



Similar documents
Multifaceted Resource Management for Dealing with Heterogeneous Workloads in Virtualized Data Centers

Heterogeneous Workload Consolidation for Efficient Management of Data Centers in Cloud Computing

Multifaceted Resource Management for Dealing with Heterogeneous Workloads in. Virtualized data center

Adaptive Scheduling on Power-Aware Managed Data-Centers using Machine Learning

Multifaceted Resource Management on Virtualized Providers

2. Research and Development on the Autonomic Operation. Control Infrastructure Technologies in the Cloud Computing Environment

International Journal of Advance Research in Computer Science and Management Studies

Leveraging Renewable Energy in Data Centers: Present and Future

IMPROVEMENT OF RESPONSE TIME OF LOAD BALANCING ALGORITHM IN CLOUD ENVIROMENT

Energy-Awareness at Data Centers: An Overview of Management and Architecture Framework Techniques

Energy Constrained Resource Scheduling for Cloud Environment

Efficient Data Management Support for Virtualized Service Providers

EPOBF: ENERGY EFFICIENT ALLOCATION OF VIRTUAL MACHINES IN HIGH PERFORMANCE COMPUTING CLOUD

Comparison of Request Admission Based Performance Isolation Approaches in Multi-tenant SaaS Applications

solution brief September 2011 Can You Effectively Plan For The Migration And Management of Systems And Applications on Vblock Platforms?

Characterizing Task Usage Shapes in Google s Compute Clusters

Dynamic Resource Allocation in Software Defined and Virtual Networks: A Comparative Analysis

Power Management in Cloud Computing using Green Algorithm. -Kushal Mehta COP 6087 University of Central Florida

CS423 Spring 2015 MP4: Dynamic Load Balancer Due April 27 th at 9:00 am 2015

Application of Predictive Analytics for Better Alignment of Business and IT

AUGURES, Profit-Aware Web Infrastructure Management

Energetic Resource Allocation Framework Using Virtualization in Cloud

Virtual Machine Allocation in Cloud Computing for Minimizing Total Execution Time on Each Machine

Resource Models: Batch Scheduling

Energy Efficient MapReduce

J2EE Instrumentation for software aging root cause application component determination with AspectJ

can you effectively plan for the migration and management of systems and applications on Vblock Platforms?

Managing Adaptability in Heterogeneous Architectures through Performance Monitoring and Prediction

IBM Platform Computing Cloud Service Ready to use Platform LSF & Symphony clusters in the SoftLayer cloud

Praktikum Wissenschaftliches Rechnen (Performance-optimized optimized Programming)

Advanced Load Balancing Mechanism on Mixed Batch and Transactional Workloads

DSS. Diskpool and cloud storage benchmarks used in IT-DSS. Data & Storage Services. Geoffray ADDE

Green Cloud Computing 班 級 : 資 管 碩 一 組 員 : 黃 宗 緯 朱 雅 甜

SafePeak Case Study: Large Microsoft SharePoint with SafePeak

How To Manage Cloud Service Provisioning And Maintenance

International Journal of Digital Application & Contemporary research Website: (Volume 2, Issue 9, April 2014)

A Middleware Strategy to Survive Compute Peak Loads in Cloud

CHAPTER 1 INTRODUCTION

System Models for Distributed and Cloud Computing

LOAD BALANCING TECHNIQUES

Power Efficiency Metrics for the Top500. Shoaib Kamil and John Shalf CRD/NERSC Lawrence Berkeley National Lab

Self-Tuning Memory Management of A Database System

Predicting web server crashes: A case study in comparing prediction algorithms

Windows Server 2008 R2 Hyper-V Live Migration

Comparison of PBRR Scheduling Algorithm with Round Robin and Heuristic Priority Scheduling Algorithm in Virtual Cloud Environment

ENERGY EFFICIENT CONTROL OF VIRTUAL MACHINE CONSOLIDATION UNDER UNCERTAIN INPUT PARAMETERS FOR THE CLOUD

A Taxonomy and Survey of Energy-Efficient Data Centers and Cloud Computing Systems

Sla Aware Load Balancing Algorithm Using Join-Idle Queue for Virtual Machines in Cloud Computing

CloudCmp:Comparing Cloud Providers. Raja Abhinay Moparthi

Network Infrastructure Services CS848 Project

Energy-aware Memory Management through Database Buffer Control

Energy-aware job scheduler for highperformance

Payment minimization and Error-tolerant Resource Allocation for Cloud System Using equally spread current execution load

Dynamic Resource allocation in Cloud

A SURVEY ON LOAD BALANCING ALGORITHMS FOR CLOUD COMPUTING

Efficient and Enhanced Load Balancing Algorithms in Cloud Computing

Flash Research Assignment: Virtualization and Cloud Computing

BSC vision on Big Data and extreme scale computing

Managing Data Center Power and Cooling

Scheduling using Optimization Decomposition in Wireless Network with Time Performance Analysis

LPV model identification for power management of Web service systems Mara Tanelli, Danilo Ardagna, Marco Lovera

A Generic Auto-Provisioning Framework for Cloud Databases

A Distributed Approach to Dynamic VM Management

A CP Scheduler for High-Performance Computers

Dynamic resource management for energy saving in the cloud computing environment

Who needs humans to run computers? Role of Big Data and Analytics in running Tomorrow s Computers illustrated with Today s Examples

Load Distribution in Large Scale Network Monitoring Infrastructures

TU-E2020. Capacity Planning and Management, and Production Activity Control

SoSe 2014 Dozenten: Prof. Dr. Thomas Ludwig, Dr. Manuel Dolz Vorgetragen von Hakob Aridzanjan

Transcription:

Towards energy-aware scheduling in data centers using machine learning Josep Lluís Berral, Íñigo Goiri, Ramon Nou, Ferran Julià, Jordi Guitart, Ricard Gavaldà, and Jordi Torres Universitat Politècnica de Catalunya BSC-CNS, Barcelona Supercomputing Center eenergy 10 - April 2010 1

Context: Energy, Autonomic Computing and Machine Learning Keywords: Autonomic Computing (AC): Automation of management Machine Learning (ML): Learning patterns and predict them Applying AC and ML to energy control: Self-management must include energy policies Optimization mechanisms are becoming more complex... and they can be improved through automation and adaption Challenges for autonomic energetic management: Datacenters policies require adaption towards constant optimization Complexity can be saved through modeling and learning If a system follows any pattern, maybe ML can find an accurate model to help the decision makers and improve policies 2

Introduction Self-management looking towards Energy Saving: Apply the well-known consolidation strategy Consolidation strategy: Reduce the turned on machines grouping tasks in less machines Turn off as many IDLE machines as possible (but not all!) Main Contributions Consolidate tasks in a datacenter environment Predict information a priori to solve uncertainty and play it safe Design adequate metrics to compare consolidation solutions Turn on/off machines from SLA vs. Power trade-off method 3

Energy Aware Scheduling Consolidation Execute all tasks with the minimum amount of machines Unused machines are turned off Known policies: Random, Greedy policies, (Dynamic) Backfilling Policies and Constraints SLA fulfillments must not degrade excessively Operations must reduce or maintain energy consumption Turn off as many machines as possible? 4

EAS: Machine Learning application (I) Prediction a priori : Deal with uncertainty Anticipate future information Applying Machine Learning: Relevant variables for decision making only available a posteriori ML creates a model from past examples Ended Jobs Training Dataset (posteriori data) ML New Job Data to Predict Model Estimates Data for the new Job Desired information a priori : SLA fulfillment level: i.e. we don t know the exact finish time per task Consumption: i.e. we don t know the consumption before placing a task Learn a model to induce: <Info. Running tasks, Info. Host> <SLA fulfillment, Power Consumption> 5

EAS: Machine Learning application (II) Information a posteriori R h : Average SLA fulfillment level of jobs in host C h : Host consumption Finished jobs: Information about ended jobs Host: Information about host capabilities Learn a model to induce <Running jobs, Host> <R h,c h > Used Variables Post-mortem data: Finished Job: <Job Info, T start,t end,t user,sla Fact > R j Host Consumption: <Usage Res > C h Available data: Running Job: <CPU Usage,T start,t now,t user,sla Fact > R j Host Consumption: <CPU Available > C h Host SLA fulfillment: aggregation of R j R h 6

EAS: Machine Learning application (III) Backfilling and Dynamic Backfilling policies: Purpose: fill turned on hosts before starting off-line ones When a task enters, it is always put on the most fillable host At each scheduling round, move tasks to get more consolidation Applying Machine Learning: We learn the SLA fulfillment impact and consumption impact, for each past schedule For each possible task allocation <host, jobs on host+new job>: Estimation of resulting SLA fulfillment Estimation of resulting power consumption If they don t degrade, allocation is viable Dynamic Backfilling: Change the static data by estimated data 7

Simulation and Metrics Self-created simulator: Simulates a data center able to execute tasks according to different scheduling policies Takes into account CPU consumption and energy Able to turn on/off simulated machines Metrics: There is no standard approach to compare power efficiency We introduce metrics to compare adaptive solutions: Working nodes, Running nodes, CPU usage, Power consumption, SLA fulfillment level... 8

Evaluation (I): Shutting down machines Power vs SLA fulfillment trade-off Determine when to shut down IDLE nodes, and turn on new ones Find the adequate number of IDLE on machines It depends on the number of running tasks Determine range of IDLE machines (minimum and maximum) Trade-off between energy and required resources At what load start off-line machines, or shut down IDLE ones 9

Evaluation (II): Consolidation Experimental Environment Simulated datacenter with 400 hosts (4 CPU per host) Workload: fixed CPU size tasks and variable CPU size tasks Use of Linear Regression and M5P for SLA and Power prediction Experimental Results Consolidation techniques perform better than the other techniques: Backfilling & Dynamic BF SLA fulfillment around 99% CPU utilization more stable and lower power consumption 10

Evaluation (III): Machine Learning Experimentation Results (II) Dynamic BF + ML performs better, having uncertainty (service and heterogeneous workloads) Accuracy around 98.5% on predictions Detail: Values with highest estimation always had highest accuracy (kwh) 11

Conclusions and Future Work Challenge and Contribution Vertical and intelligent consolidation methodology Metrics to evaluate different consolidation approaches Predict application SLA timings and power consumption to decide scheduling Experimentation Results Consolidation aware techniques: Improve power efficiency Compare backfilling with standard techniques Machine Learning method: Close to consolidation techniques Better when information is inaccurate Current and Future Work More complex SLA fulfillment (response time, throughput, ) More complex Resource elements (CPU, memory, I/O elements) More elaborated Policy optimization (utility functions) Addition of virtualization overheads 12

Thank you for your attention 13