Decentralized control of stochastic multi-agent service system. J. George Shanthikumar Purdue University



Similar documents
Duality in General Programs. Ryan Tibshirani Convex Optimization /36-725

Role of Stochastic Optimization in Revenue Management. Huseyin Topaloglu School of Operations Research and Information Engineering Cornell University

Nonlinear Optimization: Algorithms 3: Interior-point methods

Name: ID: Discussion Section:

A Utility Maximization Example

Spatial Decomposition/Coordination Methods for Stochastic Optimal Control Problems. Practical aspects and theoretical questions

An Improved Dynamic Programming Decomposition Approach for Network Revenue Management

6.231 Dynamic Programming and Stochastic Control Fall 2008

Further Study on Strong Lagrangian Duality Property for Invex Programs via Penalty Functions 1

Principles of demand management Airline yield management Determining the booking limits. » A simple problem» Stochastic gradients for general problems

constraint. Let us penalize ourselves for making the constraint too big. We end up with a

Moral Hazard. Itay Goldstein. Wharton School, University of Pennsylvania

MAT 274 HW 2 Solutions c Bin Cheng. Due 11:59pm, W 9/07, Points

Linear Programming Notes V Problem Transformations

On Adaboost and Optimal Betting Strategies

Insurance. Michael Peters. December 27, 2013

A New Quantitative Behavioral Model for Financial Prediction

10. Proximal point method

Controlling the Internet in the era of Software Defined and Virtualized Networks. Fernando Paganini Universidad ORT Uruguay

Linear Threshold Units

Principle of Data Reduction

Two-Stage Stochastic Linear Programs

Lecture Notes 10

Alok Gupta. Dmitry Zhdanov

A few algorithmic issues in data centers Adam Wierman Caltech

Numerisches Rechnen. (für Informatiker) M. Grepl J. Berger & J.T. Frings. Institut für Geometrie und Praktische Mathematik RWTH Aachen

A Randomized Linear Programming Method for Network Revenue Management with Product-Specific No-Shows

Economics of Insurance

Online Algorithms: Learning & Optimization with No Regret.

Reinforcement Learning

Recovery of primal solutions from dual subgradient methods for mixed binary linear programming; a branch-and-bound approach

Envelope Theorem. Kevin Wainwright. Mar 22, 2004

Lecture 2: Consumer Theory

Simulating Stochastic Differential Equations

Stochastic Models for Inventory Management at Service Facilities

Economics 326: Duality and the Slutsky Decomposition. Ethan Kaplan

Optimal shift scheduling with a global service level constraint

JANUARY 2016 EXAMINATIONS. Life Insurance I

Numerical methods for American options

Scheduling Algorithms for Downlink Services in Wireless Networks: A Markov Decision Process Approach

12.5: CHI-SQUARE GOODNESS OF FIT TESTS

General Theory of Differential Equations Sections 2.8, , 4.1

GI01/M055 Supervised Learning Proximal Methods

The Optimal Growth Problem

High-frequency trading in a limit order book

1 Introduction. Linear Programming. Questions. A general optimization problem is of the form: choose x to. max f(x) subject to x S. where.

15 Kuhn -Tucker conditions

A NEW LOOK AT CONVEX ANALYSIS AND OPTIMIZATION

Course: Model, Learning, and Inference: Lecture 5

Gambling and Data Compression

A Log-Robust Optimization Approach to Portfolio Management

Functional Optimization Models for Active Queue Management

Introduction to Auction Design

Super-Agent Based Reputation Management with a Practical Reward Mechanism in Decentralized Systems

CHAPTER 9. Integer Programming

Optimal Income Taxation with Multidimensional Types

Date: April 12, Contents

Government Debt Management: the Long and the Short of it

Statistics 100A Homework 8 Solutions

Hydrodynamic Limits of Randomized Load Balancing Networks

Integrating Benders decomposition within Constraint Programming

SINGLE-STAGE MULTI-PRODUCT PRODUCTION AND INVENTORY SYSTEMS: AN ITERATIVE ALGORITHM BASED ON DYNAMIC SCHEDULING AND FIXED PITCH PRODUCTION

On the Interaction and Competition among Internet Service Providers

Inflation. Chapter Money Supply and Demand

REVIEW OF MICROECONOMICS

The Multicut L-Shaped Method

Lecture Cost functional

CHAPTER 14. Z(t; i) + Z(t; 5),

Manual for SOA Exam MLC.

Algorithmic Mechanism Design for Load Balancing in Distributed Systems

Lecture notes on Moral Hazard, i.e. the Hidden Action Principle-Agent Model

A Simple Model of Price Dispersion *

Health Insurance and Retirement Incentives

Distributed Machine Learning and Big Data

Volatility, Productivity Correlations and Measures of. International Consumption Risk Sharing.

How To Play A Static Bayesian Game

Stepwise Regression. Chapter 311. Introduction. Variable Selection Procedures. Forward (Step-Up) Selection

1 Uncertainty and Preferences

Supplement to Call Centers with Delay Information: Models and Insights

Systems with Persistent Memory: the Observation Inequality Problems and Solutions

Nonlinear Programming Methods.S2 Quadratic Programming

Economics 200B Part 1 UCSD Winter 2015 Prof. R. Starr, Mr. John Rehbeck Final Exam 1

Linear Programming Notes VII Sensitivity Analysis

24. The Branch and Bound Method

Summer course on Convex Optimization. Fifth Lecture Interior-Point Methods (1) Michel Baes, K.U.Leuven Bharath Rangarajan, U.

Politecnico di Torino, Short Course on: Optimal Control Problems: the Dynamic Programming Approach

Transcription:

Decentralized control of stochastic multi-agent service system J. George Shanthikumar Purdue University Joint work with Huaning Cai, Peng Li and Andrew Lim 1

Many problems can be thought of as stochastic dynamic resource allocation problems: Airline alliances Investment firms with multiple trading desks Large scale service systems, call centers... Objective: operate the integrated system efficiently. 2

Centralized decision making is often not possible: different groups specialize in different tasks. relevant information is not/can not be shared integrating IT systems across different groups is hard. decentralized agents 3

Decentralized control involves many things: different agents have knowledge and control of different parts of the system different agents can observe different subsets of the state e.g. Witsenhausen (1968), Ho (1980), Rantzer (2009), Bertsekas (2007), Adelman & Mersereau (2008) different agents have different incentives Key issues: Each decision maker maximizes its own objectives conditional on the available information. In general: local optimality global optimality How do we efficiently manage such multi-agent systems? 4

Overview Centralized formulation of the pricing-service problem Decentralized model Stochastic models for pricing and service agents Coordination mechanism Pricing agent s problem Service agent s problem Main results Coordinating through transfer prices Computing transfer prices with info. sharing constraints Example 5

Problem description arrivals occur randomly at a rate determined by the prices dynamically controlled by (multiple) pricing agents. service times are random and determined by the dynamically controlled service rate net profits = revenue from arrivals - service costs 6

Standard approach: centralized model max E p( ),µ( ) T 0 m j=1 p j (t )dn j (T) T 0 c(x(t), µ(t))dt h(x(t)) dx(t) = m j=1 dn j (t) dj(t), x(0) = x 0 0 x(t) C, t [0, T], N i (t) λ i (p i (t )), J(t) µ(t ). Assumes there is and all-knowing dictator jointly controlling all pricing and service decisions. 7

It is possible to write down the dynamic programming equations 0 = V t (t, x)+max p,µ { c(x, µ)+µ ( V(t, x 1) V(t, x) )} i λ i (p i ) [ p i ( V(t, x) V(t, x+1) )] V(t, x) = h(x) but this is irrelevant if no single agent knows all demand models λ i (p i ) and the cost structure (c(x, µ), h(x)) and has the power to jointly control all decision variables. 8

Decentralized control 9

Pricing agent i controls p i (t) and service agent controls µ(t). Integrated system profits & dynamics: E T m 0 j=1 p j (t )dn j (T) T 0 c(x(t), µ(t))dt h(x(t)) dx(t) = m j=1 dn j (t) dj(t), x(0) = x 0 0 x(t) C, t [0, T], N i (t) λ i (p i (t )), J(t) µ(t ). Coupling comes through the state variable x(t). Key issue: what do pricing/service agents know and how do they make decisions? 10

Pricing agent controls p i (t) and knows demand function λ i (p i ). may not know the number of other agents, let alone their demand functions λ j (p j ), their pricing policies p j (t, x) or the aggregate arrival rate. may not know cost functions (c(x, µ), h(x)) for service agent or the service policy µ(t). observes x(t) but may not know its true dynamics. 11

Pricing agent s model where dx(t) = dn i (t) dñ i (t) d J i (t) N i (t) λ i (p i ) is the arrival process controlled by agent i (accurately specified) Ñ i (t) λ i (t) is agent i s model for arrivals brought by other agents (mis-specified) J i (t) µ i (t) is agent i s model for the service system (mis-specified) 12

Service agent Observes x(t) and assumes dynamics where dx(t) = dñ 0 (t) dj 0 (t) Ñ 0 (t) λ 0 (t) is the service agent s aggregate arrival process model; (mis-specified) J 0 (t) µ(t) is the service process (accurately specified). 13

Local optimality does not give global optimality Key observation: no choice of λ i (t) or µ i (t) result in a centrally optimal choice of p i (t). e.g. Pricing agent i s problem: max p i E T 0 p i(t )dn i (t) = E T dx(t) = dn i (t)+dñ i (t) d J i (t) 0 x(t) C 0 p i(t )λ i (p i (t ))dt N i (t) λ i (p i (t )), Ñ i (t) λ i (t), J i (t) µ i (t) Trivial case: when C = the pricing agent maximizes revenue rate p i λ i (p i ) which is not optimal for the system. 14

Transfer contracts and revenue sharing 15

Revenue transfer when pricing agent 2 makes a sale Service agent p2(t,x) S(t,x) 1 2 3 Q1(t,x) Q3(t,x) Pricing agents Income for Agent 2 = p 2 (t, x(t )) Q j (t, x(t )) S(t, x(t )) j 2 16

Revenue transfer when service agent clears a customer Service agent C1(t,x) C3(t,x) C2(t,x) 1 2 3 Pricing agents Income for service agent = m i=1 C i (t, x(t )) 17

Revenue sharing/transfer prices When pricing agent i brings in a customer and changes state from x x+1 at time t: pay Q j (t, x) to pricing agent j; pay S(t, x) to service agent. total cost: R i (t, x) = j iq j (t, x)+s(t, x) When service agent completes service and changes state from x x 1 at time t: receive C i (t, x) from pricing agent i. total revenue: R 0 (t, x) = j C j (t, x(t )) 18

Pricing agent s problem Given transfer prices C i, Q i and R i (t, x) = j iq j (t, x)+s(t, x) pricing agent i solves T [ E p i 0 T max + p i (t ) R i (t, x(t )) 0 Q i(t, x(t ))dñ i (t) T dx(t) = dn i (t)+dñ i (t) d J i (t) 0 x(t) C ] dn i (t) 0 C i(t, x(t ))d J i (t) N i (t) λ i (p i (t )), Ñ i (t) λ i (t), J i (t) µ i (t) } {{ } accurate } {{ } mis-specified 19

Service agent s problem Given transfer prices S and R 0 (t, x) = j C j (t, x(t )) service agent solves min E µ(t ) T T T c(x(t), µ(t))dt S(t, 0 0 x(t ))dñ 0 (t) 0 R 0(t, x(t ))dj 0 (t)+h(x(t)) dx(t) = dñ 0 (t) dj 0 (t), x(0) = x 0 0 x(t) C Ñ 0 (t) λ 0 (t), J 0 (t) µ(t) } {{ } mis-specified } {{ } accurate 20

Weak duality For any choice of transfer prices Centralized (opt.) profit {agent j s (max.) profit} service agents (min.) cost j (RHS depends on the choice of transfer contracts) Is it possible to choose transfer prices that achieve equality? What is the impact of model mis-specification by the agents? 21

Consider the transfer contracts Main Result: General system Q j (t, x) = V j (t, x) V j (t, x+1) S(t, x) = W(t, x+1) W(t, x) C j (t, x) = V j (t, x 1) V j (t, x) where 0 = d dt V i(t, x)+maxλ p i (p i ) {p i [ V(t, x) V(t, x+1) ]} i and V i (T, x) = 0 0 = d W(t, x)+min {c(x, µ) µ [ V(t, x) V(t, x 1) ]} dt µ W(T, x) = h(x) Note: V(t, x) is the value function for the centralized problem. 22

Contracts coordinate the system; Strong duality: Centralized (opt.) profit {agent j s (max.) profit} service agents (min.) cost = j Robustness to mis-specification: p j (t) is independent of agent j s assumptions about the other pricing agents and the service agent µ (t) is independent of the service agent s assumptions about the pricing agents. 23

Intuition: Optimal contracts W(t, x) is the value function for the service agent and V j (t, x) as the value function of pricing agent j under optimal TP s. When pricing agent j makes a sale: x x+1 W(t, x+1) W(t, x) = service agent s expected cost increase V i (t, x+1) V i (t, x) = exp. revenue decrease for p. agent i. When service agent clears a customer: x x 1 V i (t, x 1) V i (t, x)= value of extra space to p. agent i. 24

Under optimal transfer prices: when pricing agent i makes a sale: compensate other pricing agents their loss in revenue Q j (t, x) = V j (t, x) V j (t, x+1) compensate the service agent the cost of processing another customer S(t, x) = W(t, x+1) W(t, x) When service clears a customer, each pricing agent pays the service agent its valuation of the extra resource (1 spot) C j (t, x) = V j (t, x 1) V j (t, x) 25

Intuition: Robustness to mis-specification If an agent is compensated his valuation of the shared resource, he/she is indifferent as to whether it is used by another agent indifference to rate of consumption by others robustness to mis-specification. 26

Additional comments TP s summarize what is important about the other agents: Details such as the number of other agents and dynamics are not required. TP s can be interpreted as Lagrange multipliers for a stochastic dynamic resource allocation problem. Convexity was not used to establish strong duality. 27

Key gap: How do we compute transfer prices without needing to solve the centralized problem? i.e. how do we compute optimal contracts when there is no mechanism designer. 28

Recall: Contracts Computing the transfer prices Q j (t, x) = V j (t, x) V j (t, x+1) S(t, x) = W(t, x+1) W(t, x) C j (t, x) = V j (t, x 1) V j (t, x) where 0 = d dt V i(t, x)+maxλ p i (p i ) {p i [ V(t, x) V(t, x+1) ]} i and V i (T, x) = 0 0 = d W(t, x)+min {c(x, µ) µ [ V(t, x) V(t, x 1) ]} dt µ W(T, x) = h(x) Problem: Requires knowledge of the value function for the centralized problem. 29

A fixed point representation Q j (t, x) = V j (t, x) V j (t, x+1) S(t, x) = W(t, x+1) W(t, x) C j (t, x) = V j (t, x 1) V j (t, x) where 0 = d { W(t, x)+min c(x, µ) dt µ µ [ m j=1 + λ 0 (t, x) C j (t, x)+w(t, x) W(t, x 1) ]} { W(t, x+1) W(t, x) S(t, x) } W(T, x) = h(x) 30

A fixed point representation (cont ) 0 = d dt V i(t, x)+maxλ p i (p i ) {p i [ Q j (t, x)+s(t, x) ] i j i [ V i (t, x) V i (t, x+1) ]} + λ i (t, x) {Q i (t, x) [ V i (t, x) V i (t, x+1) ]} { } + µ i (t, x) V i (t, x 1) V i (t, x) C i (t, x) V i (T, x) = 0 31

Fixed point representation Find Q = [Q 1,, Q m ], C = [C 1,, C m ] and S such that: Q j (t, x) = V j (t, x) V j (t, x+1) S(t, x) = W(t, x+1) W(t, x) (C j (t, x) = V j (t, x 1) V j (t, x) = Q j (t, x 1)) where V 1 (t, x) value function for pricing agent 1. V m (t, x) value function for pricing agent m W(t, x) value function for service agent Natural idea: Iterate of Q, C and S. 32

Observations: The ODEs are DP equations for the pricing and service problems under contracts C, Q and S. The fixed point equation relates optimal contracts to value functions of the associated decentralized problems. 33

Interpretation as a negotiation current valuation Qi current cost S PAi Central computer SA new contracts new contracts 34

Main result Under standard technical assumptions, the algorithm converges to the optimal transfer prices. Comments: Assumes truth telling in reporting of valuations in each round. Iterative algorithm is analogous to updating dual variables. Convexity is not required for convergence or strong duality. (However, we assume each agent solves a DP at each iteration exactly) All that is communicated between agents are contracts constructed by aggregating transfer prices. (The number of agents and their models are not needed) 35

Example Single pricing and service agent; pricing agent: demand rate λ(p) = ae bp ; assumes µ 0 (mis-specified); service agent: costs c(x,µ) = kx+c(e dµ 1) and h(x) = sx; assumes λ 0 (mis-specified); a = 10,b = 1,c = 1,d = 1,k = 0.1,s = 1.3,C = 10,T = 10 36

Convergence check: V {V p n W n } as n 14 Convergence Check V 1 +V 2 V max 12 10 8 6 4 2 0 1 2 3 4 5 6 7 8 9 10 iteration 37

Optimal revenue sharing functions 2 1.9 1.8 Tax function for pricing agent x=2 x=6 R 1 (t,x) 1.7 1.6 1.5 1.4 1.3 0 2 4 6 8 10 t When close to the terminal time T = 10 the pricing agent s tax converges to the cost of not serving the customer. In this case, h(x) = 1.3x (each unserved customer costs 1.3). 38

0.35 Bonus function for service agent R 2 (t,x) 0.3 0.25 0.2 0.15 0.1 0.05 0 x=6 x=2 0.05 0 2 4 6 8 10 t When close to terminal time T = 10, the service agent receives less reward for clearing a customer. Intuition: The value of an extra space is of diminishing value as the selling time decreases. 39

Conclusion Coordination through transfer prices TP s summarize what is important about other agents. Iterative algorithm for computing TP s that only require exchange of valuations Drawbacks: Assumption of truth-telling when exchanging valuations Each agent needs to solve a DP ( approximations) Extensions: heterogeneous risk attitudes multi-agent portfolio choice problems airline alliances 40