Peak load reduction for distributed backup scheduling



Similar documents
LECTURE 4. Last time: Lecture outline

ON THE K-WINNERS-TAKE-ALL NETWORK

How performance metrics depend on the traffic demand in large cellular networks

On the Traffic Capacity of Cellular Data Networks. 1 Introduction. T. Bonald 1,2, A. Proutière 1,2

2.3 Convex Constrained Optimization Problems

Reinforcement Learning


Perron vector Optimization applied to search engines

Reinforcement Learning

Brownian Motion and Stochastic Flow Systems. J.M Harrison

INTEGRATED OPTIMIZATION OF SAFETY STOCK

Chapter 2: Binomial Methods and the Black-Scholes Formula

Basic Multiplexing models. Computer Networks - Vassilis Tsaoussidis


IEOR 6711: Stochastic Models, I Fall 2012, Professor Whitt, Final Exam SOLUTIONS

Center for Mathematics and Computational Science (CWI) Phone: (+31)

The Exponential Distribution

Modeling and Performance Evaluation of Computer Systems Security Operation 1

Random access protocols for channel access. Markov chains and their stability. Laurent Massoulié.

Coordinated Pricing and Inventory in A System with Minimum and Maximum Production Constraints

IN THIS PAPER, we study the delay and capacity trade-offs

Cloud for PC SOFT applications Pricing

Stochastic Inventory Control

Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh

Further Study on Strong Lagrangian Duality Property for Invex Programs via Penalty Functions 1

6.231 Dynamic Programming and Stochastic Control Fall 2008

ALOHA Performs Delay-Optimum Power Control


Single-Period Balancing of Pay Per-Click and Pay-Per-View Online Display Advertisements

Hydrodynamic Limits of Randomized Load Balancing Networks

Analysing the impact of CDN based service delivery on traffic engineering


Lecture notes on Moral Hazard, i.e. the Hidden Action Principle-Agent Model

Change Management in Enterprise IT Systems: Process Modeling and Capacity-optimal Scheduling

OPTIMAL CONTROL OF FLEXIBLE SERVERS IN TWO TANDEM QUEUES WITH OPERATING COSTS

Time-Of-Use Pricing Policies for Offering Cloud Computing as a Service

An Available-to-Promise Production-Inventory. An ATP System with Pseudo Orders

Load Balancing and Switch Scheduling

CHAPTER 3 SECURITY CONSTRAINED OPTIMAL SHORT-TERM HYDROTHERMAL SCHEDULING

QoS for Cloud Computing!

Passive Discovery Algorithms

The Feasibility of Supporting Large-Scale Live Streaming Applications with Dynamic Application End-Points

Optimization of Supply Chain Networks

Periodic Capacity Management under a Lead Time Performance Constraint

Notes from Week 1: Algorithms for sequential prediction

Binomial lattice model for stock prices

The Ergodic Theorem and randomness

Cooperative Multiple Access for Wireless Networks: Protocols Design and Stability Analysis

Graph Theory and Complex Networks: An Introduction. Chapter 08: Computer networks

Discrete Dynamic Optimization: Six Examples

M/M/1 and M/M/m Queueing Systems

A Passivity Measure Of Systems In Cascade Based On Passivity Indices

Optimal Taxation in a Limited Commitment Economy

1. (First passage/hitting times/gambler s ruin problem:) Suppose that X has a discrete state space and let i be a fixed state. Let

Chapter 6. Linear Transformation. 6.1 Intro. to Linear Transformation

The Joint Distribution of Server State and Queue Length of M/M/1/1 Retrial Queue with Abandonment and Feedback

Characterization and Modeling of Packet Loss of a VoIP Communication

On the Oscillation of Nonlinear Fractional Difference Equations

Joint Optimization of Overlapping Phases in MapReduce

Stochastic Models for Inventory Management at Service Facilities

Buy Low and Sell High

Basics of information theory and information complexity

Earliest Due Date (EDD) [Ferrari] Delay EDD. Jitter EDD

HERMES: Human Exposure and Radiation Monitoring of Electromagnetic Sources.

Detecting Multiple Selfish Attack Nodes Using Replica Allocation in Cognitive Radio Ad-Hoc Networks

From Binomial Trees to the Black-Scholes Option Pricing Formulas

Optimal Hiring of Cloud Servers A. Stephen McGough, Isi Mitrani. EPEW 2014, Florence

Research Article Two-Period Inventory Control with Manufacturing and Remanufacturing under Return Compensation Policy

Staffing and Control of Instant Messaging Contact Centers

Nimble Algorithms for Cloud Computing. Ravi Kannan, Santosh Vempala and David Woodruff

Offline sorting buffers on Line

Airport Planning and Design. Excel Solver

A joint chance-constrained programming approach for call center workforce scheduling under uncertain call arrival forecasts

Transcription:

Peak load reduction for distributed backup scheduling Peter van de Ven joint work with Angela Schörgendorfer (Google) and Bo Zhang (IBM Research)

2002-2007 2014-2007-2011 2011-2014 2006-2007

My research interests are in the design and analysis of large (distributed) stochastic systems: - performance analysis; scheduling; stochastic optimization and control

Part I: Peak load reduction for distributed backup scheduling

load 8am 6pm Load is highly varying throughout the day - network costs are non-linear in the hourly load - aim to reduce network costs by shifting peak load

load backups 8am 6pm - the load is comprised of different types of traffic - we focus on the backup traffic

load backups 8am 6pm Users (workstations) aim to do daily backups - the backup transmitted across the network to a central server - backup traffic can be shifted because of delay-tolerance

load load backups backups 8am 6pm 8am 6pm current target Backups are initiated by the users, based on local information only - backup policy is characterized by hourly backup probabilities However - users are not always connected - the server may have some capacity constraint

load load backups backups 8am 6pm 8am 6pm current target Question 1: How much traffic can the network handle for given backup probabilities? Question 2: How to choose the backup probabilities to minimize the network costs while ensuring regular backups?

Model outline

N users network server Consider N users that generate data (bits) over time - user backup to a central server - access to server over a data network

A 1(t) B(t) i A 2(t) A 3(t) A 4(t) A 5(t) network server - time is slotted - user i generates A i (t) bits in slot t - data is added to the backlog B i (t)

A 1(t) B(t) i A 2(t) A 3(t) A 4(t) A 5(t) network C i (t) server - users can only do a backup when connected to the network - connectivity is represented by C i (t) {0, 1}, i = 1,..., N

A 1(t) B(t) i σ i (t) A 2(t) A 3(t) A 4(t) A 5(t) network C i (t) server Connected users may decide to do a backup (σ i (t) = 1) or not (σ i (t) = 0)

A 1(t) B(t) i σ i (t) A 2(t) A 3(t) A 4(t) A 5(t) network C i (t) server Connected users may decide to do a backup (σ i (t) = 1) or not (σ i (t) = 0) The backlog evolution and offered backup load are given by B i (t + 1) = (A i (t) + B i (t))(1 C i (t)σ i (t)), N H(t) = (A i (t) + B i (t))c i (t)σ i (t) i=1

User type & Backup probability Consider the number of days since backup W i (t) {0, 1,... } (user type): W i (t + 1) = W i (t) ( 1 σ i (t)c i (t) ) + 1 {η(t+1)=0}, with η(t) := t mod T the hour of the day

User type & Backup probability Consider the number of days since backup W i (t) {0, 1,... } (user type): W i (t + 1) = W i (t) ( 1 σ i (t)c i (t) ) + 1 {η(t+1)=0}, with η(t) := t mod T the hour of the day The backup decision is random and may depend on W i (t) and the hour of the day u(t) {0,..., T 1}: { 1 w.p. ν(u(t), Wi (t)) σ i (t) = 0 otherwise

Capacity & network cost minimization

Let W(t) = (W 1 (t),..., W N (t)) and consider the Markov process U := {(η(t), W(t)), t = 0, 1,... } For fixed ν, we can analyze U to obtain insights into the backup system

The stationary distribution π of the process U satisfies where π(u, w) = π(0, 1)k 1 (u, w), u = 0,..., T 1; w 1, π(u, 0) = π(0, 1)k 2 (u), u = 1,..., T 1, w 1 k 1 (u, w) := m(0, u 1, w) m(0, T 1, y), k 2 (u) := u v=1 y=1 and with normalizing constant [ π(0, 1) = y=1 k 1 (v 1, y)c(v 1)ν(v 1, y), T 1 w=1 u=0 T 1 k 1 (u, w) + u=1 k 2 (u)] 1.

Let u m(u, u, w) := (1 c(v)ν(v, w)), u, u = 0,..., T 1; w = 0, 1,..., v=u and define g(u, w) := 1 m(u, T 1, w)m(0, u 1, w + 1), u, u = 0,..., T 1; w = 0, 1,..., Proposition If α(u) := lim w w( g(u, w) w 1) exists and α(u) (0, ) for all u {0, 1,..., T 1}, then U is positive recurrent.

Lemma The expected backlog size of a type-w user at hour u: S(u, w) = where T 1 u =0 = p(u, u, w)e π [ B(t) η(t) = u, W (t) = w, η(τ (t)) = u ]. [ E π B(t) η(t) = u, W (t) = w, η(τ (t)) = u ] p(u, u, w) = u 1 v=u +1 T 1 v=u +1 w =0 w =0 a(v), w = 0, a(v) + (w 1)â + u 1 v=0 a(v), w 1, π(u,w ) π(u,0) c(u )ν(u, w )m(u + 1, u 1, 0), w = 0, π(u,w ) π(0,1) c(u )ν(u, w )m(u + 1, T 1, 0), w 1.

load B(t) L(t) 8am 6pm We consider the sum of H(t) (backup load) and L(t) (extraneous load) - denote H(t; ν) to reflect the impact of ν

Network costs log10(kb) 2 3 4 5 6 The data network is associated with a load-dependent hourly cost function g : R R We are interested in the average per-day network costs (T = 24) G(ν) = T 1 u=0 E{g ( L(u) + H(u; ν) ) },

We approximate G(ν) G(ν) := T 1 u=0 g ( E [ L(u) ] + E π [ H(u; ν) ]), The E [ L(u) ] can be obtained from measurements. Exploiting user independence: E π [H(u; ν)] = NT π(u, w) ( S(u, w) + a(u) ) c(u)ν(u, w), w=0

We aim to minimize the network costs, while ensuring regular backups min ν G(ν) such that T w=w j π(0, w) γ j, j = 1, 2,.... This problem is non-convex.

Implementation

We consider an IBM location with N = 5397 users 25 L u c u 0.7 20 0.6 15 0.5 0.4 10 0.3 5 0.2 0.1 5 10 15 20 u 5 10 15 20 u load L(u) connectivity c(u)

w = 1 w = 2 w = 3 w = 4 w 5 We truncate the state space at w = 5 and compute the ν(u, w) - the probability is high at night and low during working hours - users that have not done a backup in a while are more likely to schedule during peak hours

Comparison to original scheduling table Europe team s schedule: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0.5 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 0.1 0 0 0 0 0 0 0.12 0.5 1 1 1 1 1 1 1 3 1 1 1 1 1 1 1 1 0.1 0 0 0 0.12 0.14 0 0.16 0.5 1 1 1 1 1 1 1 4 1 1 1 1 1 1 1 1 0.1 0 0 0 0.12 0.14 0.2 0.33 0.5 1 1 1 1 1 1 1 =5 1 1 1 1 1 1 1 1 0.1 0.12 0.14 0.14 0.16 0.19 0.25 0.33 0.5 1 1 1 1 1 1 1 Our schedule: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 1 1 1 1 1 1 1 1 0.81 0.06 0 0 0.07 0.13 0.05 0.14 0.47 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 0.15 0.07 0 0 0.04 0.04 0.08 0.04 0.06 1 1 1 1 1 1 1 1 3 1 1 1 1 1 1 1 0.21 0.08 0 0 0.12 0.09 0.06 0.09 0.2 1 1 1 1 1 1 1 1 4 1 1 1 1 1 1 1 0.3 0.09 0 0 0.1 0.11 0.09 0.15 0.18 1 1 1 1 1 1 1 1 =5 1 1 1 1 1 1 1 0.51 0.41 0 0 0.33 0.3 0.28 0.31 0.33 0.67 1 1 1 1 1 1 1

Comparison to original scheduling table - Performance days without backup total load

Constrained system β Consider a constraint on the total amount of backup traffic: B i (t + 1) = (A i (t) + B i (t))(1 C i (t)σ i (t) min{1, β/b(t)}) Users are no longer independent - stability conditions and proofs are more involved - use mean field limit to compute long-term average behavior