Algorithms for sustainable data centers

Similar documents

A few algorithmic issues in data centers Adam Wierman Caltech

ENERGY EFFICIENT AND REDUCTION OF POWER COST IN GEOGRAPHICALLY DISTRIBUTED DATA CARDS

Data centers & energy: Did we get it backwards? Adam Wierman, Caltech

Online Algorithms for Geographical Load Balancing

Dynamic Virtual Machine Allocation in Cloud Server Facility Systems with Renewable Energy Sources

Impact of workload and renewable prediction on the value of geographical workload management. Arizona State University

Dynamic Scheduling and Pricing in Wireless Cloud Computing

Geographical Load Balancing for Online Service Applications in Distributed Datacenters

Geographical Load Balancing for Online Service Applications in Distributed Datacenters

Force-directed Geographical Load Balancing and Scheduling for Batch Jobs in Distributed Datacenters

Adaptive Online Gradient Descent

Renewable and Cooling Aware Workload Management for Sustainable Data Centers

Competitive Analysis of QoS Networks

Follow the Perturbed Leader

Online Convex Optimization

ROUTING ALGORITHM BASED COST MINIMIZATION FOR BIG DATA PROCESSING

Profit Maximization and Power Management of Green Data Centers Supporting Multiple SLAs

Exploring Smart Grid and Data Center Interactions for Electric Power Load Balancing

Data Center Energy Cost Minimization: a Spatio-Temporal Scheduling Approach

Online Resource Management for Data Center with Energy Capping

Online Algorithms: Learning & Optimization with No Regret.

The Answer Is Blowing in the Wind: Analysis of Powering Internet Data Centers with Wind Energy

Data Mining for Sustainable Data Centers

Simple and efficient online algorithms for real world applications

CSC2420 Fall 2012: Algorithm Design, Analysis and Theory

Leveraging Renewable Energy in Data Centers: Present and Future

Introduction to Online Learning Theory

Joint Optimization of Overlapping Phases in MapReduce

Big Data Processing of Data Services in Geo Distributed Data Centers Using Cost Minimization Implementation

Data Center Optimization Methodology to Maximize the Usage of Locally Produced Renewable Energy

How to assess the risk of a large portfolio? How to estimate a large covariance matrix?

MapReduce and Distributed Data Analysis. Sergei Vassilvitskii Google Research

Minimizing the Operational Cost of Data Centers via Geographical Electricity Price Diversity

Online Learning. 1 Online Learning and Mistake Bounds for Finite Hypothesis Classes

Characterizing Task Usage Shapes in Google s Compute Clusters

2.3 Convex Constrained Optimization Problems

Dynamic Workload Management in Heterogeneous Cloud Computing Environments

Low-Carbon Routing Algorithms For Cloud Computing Services in IP-over-WDM Networks

Load Balancing and Switch Scheduling

Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh

Setting deadlines and priorities to the tasks to improve energy efficiency in cloud computing

1 Introduction. 2 Prediction with Expert Advice. Online Learning Lecture 09

Computing Load Aware and Long-View Load Balancing for Cluster Storage Systems

Chapter Load Balancing. Approximation Algorithms. Load Balancing. Load Balancing on 2 Machines. Load Balancing: Greedy Scheduling

ADAPTIVE LOAD BALANCING ALGORITHM USING MODIFIED RESOURCE ALLOCATION STRATEGIES ON INFRASTRUCTURE AS A SERVICE CLOUD SYSTEMS

Online Scheduling with Bounded Migration

Online Learning and Competitive Analysis: a Unified Approach

Data Centers Power Reduction: A two Time Scale Approach for Delay Tolerant Workloads

A Truthful Incentive Mechanism for Emergency Demand Response in Colocation Data Centers. Linquan Zhang, Shaolei Ren Chuan Wu and Zongpeng Li

GI01/M055 Supervised Learning Proximal Methods

An Approach to Load Balancing In Cloud Computing

1. Implementation of a testbed for testing Energy Efficiency by server consolidation using Vmware

! Solve problem to optimality. ! Solve problem in poly-time. ! Solve arbitrary instances of the problem. !-approximation algorithm.

Online Adwords Allocation

CURTAIL THE EXPENDITURE OF BIG DATA PROCESSING USING MIXED INTEGER NON-LINEAR PROGRAMMING

STORM: Stochastic Optimization Using Random Models Katya Scheinberg Lehigh University. (Joint work with R. Chen and M. Menickelly)

Big Data - Lecture 1 Optimization reminders

Multifaceted Resource Management for Dealing with Heterogeneous Workloads in Virtualized Data Centers

Online Convex Programming and Generalized Infinitesimal Gradient Ascent

SoSe 2014 Dozenten: Prof. Dr. Thomas Ludwig, Dr. Manuel Dolz Vorgetragen von Hakob Aridzanjan

Demand Response in Data Centers Feasibility and Proof of Concept Demonstration

Load Balancing in Distributed Web Server Systems With Partial Document Replication

Online Learning and Online Convex Optimization. Contents

Stochastic Inventory Control

Truthful and Non-Monetary Mechanism for Direct Data Exchange

Single machine models: Maximum Lateness -12- Approximation ratio for EDD for problem 1 r j,d j < 0 L max. structure of a schedule Q...

Transcription:

Algorithms for sustainable data centers Adam Wierman (Caltech) Minghong Lin (Caltech) Zhenhua Liu (Caltech) Lachlan Andrew (Swinburne) and many others

IT is an energy hog The electricity use of data centers is 2-3% of the US total, and it is growing 12% a year! Total US use grows ~1% a year! x 60 co 2 co 2

Huge push from academia & industry Lots of engineering improvements

Power Usage Effectiveness (PUE) = >3 <1.1 Power to building Power to computing lower PUE lower energy consumption more sustainable

Power Usage Effectiveness (PUE) = >3 <1.1 Power to building Power to computing The building is efficient, it s time to work on the algorithms

This talk: An overview of algorithmic questions & challenges 1. Two examples 2. A general model & some results 3. A case study

Example 1: Dynamic resizing in a data center Workload Highly non-stationary Idling servers use 30-50% of peak power Active Servers Dynamic Controversial! resizing Goal: Adapt provisioning to minimize cost. Challenges: Switching is costly & prediction is hard.

Example 1: Dynamic resizing in a data center Workload (Interactive) Highly non-stationary Idling servers use 30-50% of peak power Active Servers Dynamic resizing Workload (Batch) Large, delay tolerate jobs, typically with deadlines Deadlines are often on the order of hours Load shifting Valley filling

Example 1: Dynamic resizing in a data center Electricity prices Active Servers Solar availability PUE Dynamic resizing Load shifting PUE Valley filling

Example 2: Geographical load balancing Data centers: Diverse, time varying electricity prices, renewable availabilities, cooling efficiencies, etc. Solar availability Electricity prices Proxy/Mapping Nodes: Highly non-stationary workload

Example 2: Geographical load balancing Goal: Adapt routing & provisioning to minimize cost. Challenges: Switching is costly & prediction is hard.

Dynamic resizing Geographical load balancing [Chase et al. 01] [Pinheiro et al. 01] [Chen et al. 05] [Ghandi et al. 09] [Khargharia et al. 10] [Kansal et al. 10] [LWAT10] [Pakbaznia et al. 09] [Qureshi et al. 09] [Rao et al. 10] [Stanojevic et al. 10] [Wendell et al 10] [Le et al 2010] [LLWLA 11] [LLWAL11] [LAW12], Goal: Adapt routing & provisioning to minimize cost. Challenges: Switching is costly & prediction is hard.

A (familiar) model

x t λ t # of active servers 0 T ~10min workload time ~1month

Note: λ t and x t can be scalar or vector x t e.g. a homogeneous data center e.g. geographical load balancing λ t # of active servers 0 T workload time

Data center goal: min x t F (stability constraints) t (operating c t x t + cost) x+ t (switching x t 1 cost)

Data center goal: min x t F t stability constraints x t N N max x t 0 x t SLA(λ t ) (operating c t x t + cost) x+ t (switching x t 1 cost)

Data center goal: min x t F t (operating c t x t + cost) x+ t (switching x t 1 cost)

Data center goal: min x t F t c t x t + x t x t 1 E.g. A homogeneous data center E.g. Cost to switch server on/off β x t x t 1 c t x t ; λ t x t = min g t (λ t λ i ) it i=1 g t l (delay) d(l) + (energy p t e lcost) r + t

Data center goal: min x t F t c t x t convex + x t x t 1

Stochastic / Adversarial x 1, c 1, x 2, c 2, x 3, c 3, online min x t F t c t x t convex + x t x t 1 smoothed Smoothed online convex optimization (SOCO) Goal: Algorithms to minimize cost

c 1 c 1 (x 1 ) F x 1

c 2 c 1 c 2 (x 2 ) F x 1 x 2 x 2 x 1

SOCO comes up in many applications geographical load balancing dynamic capacity management video streaming electricity generation planning product/service selection portfolio management labor markets penalized estimation

x 1, c 1, x 2, c 2, x 3, c 3, online min x t F t c t x t convex + x t x t 1 smoothed Smoothed online convex optimization (SOCO) Goal: Algorithms to minimize cost 2 Communities Online Learning Online Algorithms

x 1, c 1, x 2, c 2, x 3, c 3, online min x t F t c t x t convex + x t x t 1 smoothed Smoothed online convex optimization (SOCO) Goal: Algorithms to minimize cost 2 Communities Online Learning Online Algorithms 2 Metrics Regret Competitive ratio

Regret(Alg) = Cost Alg Cost(Static_Opt) Competitive ratio(alg) = Cost Alg Cost Offline_Opt

Can an algorithm maintain sub-linear regret? [KV02] [BBCM03] [Z03] [HAK07] [LRAMW12] Competitive ratio(alg) = Cost Alg Cost Offline_Opt

Can an algorithm maintain sub-linear regret? [KV02] [BBCM03] [Z03] [HAK07] [LRAMW12] Can an algorithm maintain a constant competitive ratio? [BKRS92] [BLS92] [BB00] [LWAT11][LRAMW12] Can an algorithm maintain sub-linear regret and a constant competitive ratio?

x 1, c 1, x 2, c 2, x 3, c 3, online min x t F t c t x t convex + x t x t 1 smoothed Smoothed online convex optimization (SOCO) Goal: Algorithms with sub-linear regret 2 Communities Online Learning Online Algorithms 2 Metrics Regret Competitive ratio

x 1, c 1, x 2, c 2, x 3, c 3, online min x t F t c t x t convex + x t x t 1 smoothed Online convex optimization Goal: Algorithms with sub-linear regret 2 Communities Online Learning Online Algorithms 2 Metrics Regret Competitive ratio

x 1, c 1, x 2, c 2, x 3, c 3, min x t F t c t c t x t 2 < C Online convex optimization Goal: Algorithms with sub-linear regret Many algorithms achieve sub-linear regret Online gradient descent (OGD) [Z03] x t+1 = Proj(x t η t c t x t )

Theorem [Z03] When η t = Θ(1/ t), online gradient descent has O T -regret for OCO. Theorem [Z03,H06] Any algorithm for OCO must incur Ω on linear cost functions. T -regret

x 1, c 1, x 2, c 2, x 3, c 3, min x t F t c t c t x t 2 < C Online convex optimization Goal: Algorithms with sub-linear regret Many algorithms achieve sub-linear regret - Online gradient descent (OGD) [Z03] - Multiplicative weights [AHK05] [FS99] - Newton s method based [HKKA06][HAK07]

x 1, c 1, x 2, c 2, x 3, c 3, min x t F t c t c t x t 2 < C Online convex optimization Goal: Algorithms with sub-linear regret Do OCO algorithms maintain sub-linear regret for SOCO?

x 1, c 1, x 2, c 2, x 3, c 3, min x t F t c t c t x t 2 < C + x t x t 1 smoothed Smoothed online convex optimization (SOCO) Goal: Algorithms with sub-linear regret Do OCO algorithms maintain sub-linear regret for SOCO?

Corollary: When η t = Θ(1/ t), online gradient descent still has O T -regret for SOCO. Proof: t x t x t 1 M x t x t 1 2 (Equivalence of norms) M η t c t 1 (x t 1 ) 2 (Projection is non-expansive) MC η t = O T (Bounded gradient)

Can an algorithm maintain sub-linear regret? Yes: Online gradient descent has O T -regret. Can an algorithm maintain a constant competitive ratio? [BKRS92] [BLS92] [BB00] [LWAT11][LRAMW12] Can an algorithm maintain sub-linear regret and a constant competitive ratio?

x 1, c 1, x 2, c 2, x 3, c 3, online min x t F t c t x t convex + x t x t 1 smoothed Smoothed online convex optimization (SOCO) Goal: Algorithms with constant comp. ratio 2 Communities Online Learning Online Algorithms 2 Metrics Regret Competitive ratio

c 1, x 1, c 2, x 2, c 3, x 3, x 1, c 1, x 2, c 2, x 3, c 3, online min x t F t c t x t convex + x t x t 1 smoothed Metrical task system (MTS) Goal: Algorithms with constant comp. ratio 2 Communities Online Learning Online Algorithms 2 Metrics Regret Competitive ratio

c 1, x 1, c 2, x 2, c 3, x 3, online min x t F t c t x t convex + x t x t 1 smoothed Metrical task system (MTS) Goal: Algorithms with constant comp. ratio Is it possible to achieve a constant competitive ratio?

n is the size of the action set Theorem [BLS92] [BKRS92]: Any deterministic algorithm is Ω(n)-competitive. Any randomized algorithm is Ω log n/log log n -competitive

c 1, x 1, c 2, x 2, c 3, x 3, online min x t F t convex c t x t + x t x t 1 smoothed Metrical task system (MTS) Goal: Algorithms with constant comp. ratio Is it possible to achieve a constant competitive ratio?

c 1, x 1, c 2, x 2, c 3, x 3, online min x t F t convex c t x t + x t x t 1 scalar smoothed Smoothed online convex optimization (SOCO) Goal: Algorithms with constant comp. ratio Is it possible to achieve a constant competitive ratio?

Theorem: Lazy Capacity Provisioning (LCP) is 3-competitive. Further Cost LCP Cost OPT + 2 SwitchingCost(OPT) Is it possible to achieve a constant competitive ratio?

t Minimize c s (x s t s=1 ) + s=1 1 (x s 1 x s ) + Subject to x s F x t U : last entry in Lazy Capacity Provisioning (LCP) time x t L : last entry in t Minimize c s (x s t s=1 ) + s=1 1 (x s x s 1 ) + Subject to x s F

Theorem: Lazy Capacity Provisioning (LCP) is 3-competitive. Further Cost LCP Cost OPT + 2 SwitchingCost(OPT) Lemma: The offline optimal solution is lazy in reverse time.

t Minimize c s (x s t s=1 ) + s=1 1 (x s 1 x s ) + Subject to x s F x t U : last entry in Lazy Capacity Provisioning (LCP) time x t L : last entry in Optimal t Minimize c s (x s t s=1 ) + s=1 1 (x s x s 1 ) + Subject to x s F

c 1, x 1, c 2, x 2, c 3, x 3, online min x t F t convex c t x t + x t x t 1 scalar smoothed Smoothed online convex optimization (SOCO) Goal: Algorithms with constant comp. ratio Is it possible to achieve a constant competitive ratio?

c t x t min x t F, c t+1,, c t+w t convex c t x t predictions + x t x t 1 smoothed Smoothed online convex optimization (SOCO) Goal: Algorithms with constant comp. ratio Is it possible to achieve a constant competitive ratio?

Classic Approach: Receding Horizon Control Is it possible to achieve a constant competitive ratio?

Classic Approach: Receding Horizon Control, c t 1, c t, c t+1,, c t+w 1, c t+w, c t+w+1, x t 1 x t x t+1

Receding Horizon Control (RHC): Choose x t to minimize cost over t, t + w, given prediction window and x t 1., c t 1, c t, c t+1,, c t+w 1, c t+w, c t+w+1, x t 1 x t x t+1

Receding Horizon Control (RHC): Choose x t to minimize cost over t, t + w, given prediction window and x t 1. Theorem: For one-dimensional SOCO, RHC is O(1 + 1 /w)- competitive. But, in general, RHC is Ω 1 -competitive. Does not improve as w grows!

, c t 1, c t, c t+1,, c t+w 1, c t+w, c t+w+1, RHC x t 1 x t FHC x t, x t+1,, x t+w AFHC x t is the average

Averaging Fixed Horizon Control (AFHC): Choose x t as the average of w + 1 fixed horizon control algorithms. AFHC x t is the average

Averaging Fixed Horizon Control (AFHC): Choose x t as the average of w + 1 fixed horizon control algorithms. Theorem: AFHC is O(1 + 1 /w)- competitive

Can an algorithm maintain sub-linear regret? 1 Yes: Online gradient descent has Θ T -regret. Can an algorithm maintain a constant competitive 2 ratio? Yes. (in the scalar case): LCP is 3-competitive. Yes. (in the vector case): AFHC is O( 1 /w)-comp 3 Can an algorithm maintain sub-linear regret and a constant competitive ratio? No!

Theorem: For arbitrary γ > 0 and any online algorithm A, there exist linear cost functions such that: Regret A CR A + T Deterministic or randomized γ

What now? Decide which metric you really care about? - Is εt-regret good enough? - Is log log T - competitive good enough? - What about under stochastic assumptions?

Decide which metric you really care about? - Is εt-regret good enough? - Is log log T - competitive good enough? - What about under stochastic assumptions? Theorem: Given a 1-dimensional SOCO, for arbitrary, increasing f T, Randomly Biased Greedy (RBG) is: (1 + f(t)/ 1 )-competitive with O(max {T/f(T), f(t)})-regret

Randomly Biased Greedy (f(t)) x t = argmin w t (x t ) + x t r Where: 1 RBG = f(t). w t x = min{w t 1 y + c t y + x y RBG } y r = UnifRand 1 RBG, 1 RBG Theorem: Given a 1-dimensional SOCO, for arbitrary, increasing f T, Randomly Biased Greedy (RBG) is: (1 + f(t)/ 1 )-competitive with O(max {T/f(T), f(t)})-regret

1 Can an algorithm maintain sub-linear regret? Yes: Online gradient descent has Θ T -regret. Can an algorithm maintain a constant competitive 2 ratio? Yes. (in the scalar case): LCP is 3-competitive. Yes. (in the vector case): AFHC is O( 1 /w)-comp 3 Can an algorithm maintain sub-linear regret and a constant competitive ratio? No! But (in the scalar case): RBG is O(f T )-competitive with O(max {T/f(T), f(t)}) regret.

Back to the applications

An implementation: Dynamic resizing Electricity prices Active Servers Solar availability PUE Dynamic resizing Load shifting PUE Valley filling

An implementation: Dynamic resizing WSJ ARTICLE SCREEN SCHOT HP collaborators: Yuan Chen, Cullen Bash, Martin Arlitt, Daniel Gmach, Zhikui Wang, Manish Marwah and Chris Hyser

An implementation: Dynamic resizing HP collaborators: Yuan Chen, Cullen Bash, Martin Arlitt, Daniel Gmach, Zhikui Wang, Manish Marwah and Chris Hyser

A case study: Geographical load balancing Data centers: Google data center locations Real traces for wind, solar, electricity price, etc. Solar availability Wind availability Proxy/Mapping Nodes: Scaled by population density

A case study: Geographical load balancing GLB: geographical load balancing & dynamic resizing vs. LOCAL: Route to the closest data center & dynamic resizing

# Servers ON The good Follow the renewables routing emerges. east coast west coast Day 1 Day 2

The good 8 Follow the renewables routing emerges. Huge reductions in grid usage become possible. Capacity needed to reduce grid usage by 85% 6 LOCAL 4 2 GLB 0 0.2 0.4 0.6 0.8 1 Solar capacity

The good Follow the renewables routing emerges. Huge reductions in grid usage become possible. The bad GLB uses more energy if data centers don t have renewables available. GLB LOCAL Day 1 Day 2

The good Follow the renewables routing emerges. Huge reductions in grid usage become possible. The bad GLB uses more energy if data centers don t have renewables available. The ugly GLB uses dirtier grid electricity too!

Demand response can help! The ugly GLB uses dirtier grid electricity too!

Final thoughts

Dynamic resizing Geographical load balancing 𝑥1, 𝑐1, 𝑥 2, 𝑐 2, 𝑥 3, 𝑐 3, 𝑡 min 𝑥 𝑡 𝐹 𝑐 𝑥 𝑡 𝑡 convex 𝑡 + 𝑥 𝑥 online 𝑡 1 smoothed Smoothed online convex optimization Implementation

Lots of interesting theoretical questions remain: Stochastic models for cost functions better algorithms? Predictions are extremely important in practice Stochastic variations? The full cost function is not observed in practice Bandit versions? Can an algorithm balance regret and the competitive ratio? x 1, c 1, x 2, c 2, x 3, c 3, min x t F t c t x t convex online + x t x t 1 smoothed Smoothed online convex optimization

Algorithms for sustainable data centers Adam Wierman (Caltech) Minghong Lin (Caltech) Zhenhua Liu (Caltech) Lachlan Andrew (Swinburne) and many others Minghong Lin, Adam Wierman, Lachlan Andrew, and Eno Thereska. Dynamic right-sizing for power proportional data centers. Infocom, 2011. Best Paper award winner. Zhenhua Liu, Minghong Lin, Adam Wierman, Steven Low, and Lachlan Andrew. Greening geographical load balancing. Sigmetrics 2011. Zhenhua Liu, Minghong Lin, Adam Wierman, Steven Low, and Lachlan Andrew. Geographical load balancing with renewables. Greenmetrics, 2011. Best Student Paper award winner. Zhenhua Liu, Yuan Chen, Cullen Bash, Adam Wierman, et al. Renewable and cooling aware workload management for sustainable data centers. Sigmetrics 2012. Minghong Lin, Lachlan Andrew, and Adam Wierman. Online Algorithms for Geographical Load Balancing. Green Computing Conference, 2012. Best Paper award winner.