# OPTIMIZATION AND OPERATIONS RESEARCH Vol. IV - Markov Decision Processes - Ulrich Rieder

Save this PDF as:

Size: px
Start display at page:

Download "OPTIMIZATION AND OPERATIONS RESEARCH Vol. IV - Markov Decision Processes - Ulrich Rieder"

## Transcription

1 MARKOV DECISIO PROCESSES Ulrich Rieder University of Ulm, Germany Keywords: Markov decision problem, stochastic dynamic program, total reward criteria, average reward, optimal policy, optimality equation, backward induction algorithm, policy iteration, policy improvement, linear programming. Contents 1. Introduction 2. Problem Definition and Examples 3. Finite Horizon Decision Problems 3.1 The Backward Induction Algorithm 3.2 Monotonicity of Optimal Policies 4. Infinite Horizon Markov Decision Problems 4.1 Total Reward Criteria 4.2 Computational Methods Policy Improvement Algorithm Linear Programming 4.3 Average Reward Problems 4.4 Computational Methods Policy Improvement Algorithm Linear Programming 5. Continuous-time Markov Decision Processes 5.1 Total Reward Decision Processes 5.2 Average Reward Decision Processes 6. Further Topics Glossary Bibliography Biographical Sketch Summary We consider Markov decision processes in discrete and continuous time with a finite or countable state space and an arbitrary action space. They are mathematical models for dynamic systems which can be controlled by sequential decisions and which contain stochastic elements. The main areas of applications are operations research, computer science, engineering and statistics. Typical problems arise, e.g. in queueing, inventory and investment models. At first, Markov decision processes are introduced and some examples are presented. In Section 3, we consider finite horizon decision problems and explain the backward induction algorithm. Section 4 is concerned with infinite horizon Markov decision problems. For each optimality criterion, results are formulated in as much generality as possible. Computational methods like policy iteration, policy improvement and linear programming are discussed. Section 5 introduces continuous time Markov decision processes. These problems can be reduced to discrete-time problems and thus can be solved in the same way. The last section identifies some

2 extensions and further stochastic decision processes. 1. Introduction Markov decision processes are dynamical systems that can be controlled by sequential decisions. The successive state transitions are uncertain, but the probability distribution of the next state, as well as the immediate reward, depend only on the current state and the current decision. These models are also known as stochastic dynamic programs or stochastic control problems. They have been applied in various different subject areas, such as queueing systems, inventory models, and investment models (see Queuing Systems, Inventory Models, Investment Models). Markov decision processes serve as models for controlled dynamic systems of the following type. At decision time points, the controller of the system is allowed to choose an action according to rules given by the system. The choice is based on the knowledge of the current state and has two consequences: it generates an immediate reward for the decision-maker and determines a probability distribution for the next state of the system. The aim is to select a sequence of actions, also called a policy, in such a way as to maximize an overall performance criterion. Since the choice of actions influences the transition probabilities of the system, the controller has to be aware of all consequences of his chosen decisions. These stochastic control problems can roughly be classified in Continuous-time or discrete-time decision processes, depending on which decision time points are allowed. Finite or infinite horizon decision problems. Total discounted, or average expected reward decision problems when the time horizon is infinite. In our exposition, we will restrict throughout to stochastic decision processes with finite or countable state spaces. The main objectives of the theory of Markov decision processes are to characterize optimal policies and the optimal value of the problem as well as to provide computational methods to identify optimal policies and the associate values. Basic to all these problems are the optimality equations or Bellman equations. Depending on the optimality criterion they have different forms (see Dynamic Programming and Bellman's Principle). The theory of Markov decision processes has been extensively developed since the first books by Bellman (1957) and Howard (1960) and is still an active area of research with various open questions. In Sections 2 and 3, we will first deal with finite horizon problems. Some examples are presented and we explain the backward induction algorithm. Infinite horizon problems with discrete-time parameter are considered in Section 4, where we investigate both the expected total reward problem and the expected average reward problem. Finally, Section 5 is concerned with continuous-time Markov decision processes. Throughout the exposition, focus lies on computational methods. Thus, we discuss methods like policy iteration, policy improvement and linear programming. 2. Problem Definition and Examples

3 We start our investigations with discrete-time sequential decision problems over a finite time horizon. The decision-maker has to choose actions at time points 0, 1, 2,..., 1. These time points are called decision epochs or stages and is the planning horizon of the problem. At stage n, the system is in some state i S n. S n is called the state space at stage n. Then the controller has to choose an action from the set A n. However, depending on the current state i S n, not all possible actions may be admissible. The admissible actions in state i at stage n are collected in a set A n (i) A n. Upon the choice of an action a A n (i), the decision maker obtains the reward r n (i, a). Moreover, the action determines the transition law with which the system moves into the new state. p n (i, a, j) denotes the probability that the next state of the system is j S n+1, given the system is in state i S n and action a is applied. p n (i, a, j) is called transition probability at stage n. When the final state i S is reached at the planning horizon, the controller obtains a final reward g (i). The collection of the objects (S n, A n, p n, r n, g ) is called an -stage stochastic dynamic program or Markov decision process. It is assumed that all state spaces S n are finite or countable and that all reward functions r n and g are bounded from above. A classical example for a Markov decision process is an inventory control problem. The situation is here as follows. Since the demand for a product is random, a warehouse will keep an inventory of this product to cope with this uncertainty. Every month, the manager has now to decide how much items of the product should be ordered, based on the knowledge of the current stock level. Of course, the order should not be too high, since keeping the inventory is expensive. On the other hand, the order should not be too small, since this could lead to lost sales or penalties for being unable to satisfy the customers demand. Thus, the manager faces the problem of finding an order policy that maximizes the profit. We will next give a formulation of this and further problems in the framework of stochastic dynamic programming. Example 2.1 (Inventory Control Problem with Backlogging) To get a simple mathematical formulation of the inventory control problem, we will make the following assumptions: order decisions take place at the beginning of each month and the delivery occurs instantaneously. All orders, which arrive during one month, are satisfied at the end of the month. The capacity of the warehouse is B units. In this example, we assume that customer orders, which could not be satisfied in one period, are backlogged for the next month. Thus, the inventory can get negative. Let i n be the inventory on hand at stage n which is the beginning of the (n + 1)th month. i n can take values in the set S n = {..., 1, 0, 1,..., B}. An order of a n units costs c n a n. The action space is A n = 0. Since the warehouse capacity is restricted to B, not more than B i n items can be ordered. Thus A n (i n ) = {0, 1,..., B i n }. The holding and penalty costs are L n (i n ) 0 when the stock level is i n. The immediate reward or negative cost of the system is therefore r n (i n, a n ) := (c n a n + L n (i n + a n )). The inventory after months is worth d per unit, which gives a final reward of d i, when the final stock level is i. The demand for the items in each month is random. Let q n (x) be the probability that x units are ordered in the (n + 1)th month. It is assumed that the sequential demands are stochastically

4 independent. The inventory at stage n + 1 is thus given by i n+1 = i n + a n x n. This implies that the transition probability at stage n is given by qn ( i+ a j), j i+ a pn ( i, a, j) = 0, j > i+ a (1) p n (i, a, j) is the probability that the inventory is equal to j, given the inventory one month ago is i and a units have been ordered in the (n + 1)th month. Example 2.2 (Inventory Control Problem without Backlogging) In this example, we assume that demands, which cannot be satisfied immediately, are lost. The difference in the mathematical formulation to the preceding example is as follows. The possible inventories at the beginning of the (n + 1)th month are given by S n = {0, 1,..., B}. If the demand in the (n + 1)th month is equal to x n, then the inventory at the beginning of the next month is given by i n+1 = max (0, i n + a n x n ). Hence, the transition probability is given by x + i aqn( x), j = 0 pn( i, a, j) = qn( i+ a j), 1 j i+ a. 0, j > i+ a Example 2.3 (Replacement Problem) Replacement problems are another class of typical examples for Markov decision processes. The problem is as follows. A technical system (e.g. a machine) is in use over years and its condition deteriorates randomly. The reward of the machine depends on its condition. Each year the manager has to decide whether or not the system should be replaced by a new one. It is assumed that a replacement occurs without time delay. Suppose the state i n gives the condition of the system at the beginning of the (n+1)th year. S n is a countable set with 0 S n. If i n = 0, then the system is new. At the beginning of the (n+1)th year, the manager has to decide upon the replacement. The action space can be defined by A n = A = {0, 1} with the interpretation that if a n = 1, the system is replaced by a new one and if a n = 0, no replacement is undertaken. Obviously a replacement is always allowed, hence A n (i) = A. ow suppose q n (i, j) gives the conditional probability of deterioration of the machine condition in the (n + 1)th year. The transition probability is thus defined by (2) qn (0, j), a = 1 pn ( i, a, j): = qn (, i j), a = 0 (3) The reward function is given by r n. At the end of the planning horizon, the machine can be sold and a reward g (i ), depending on the state i is obtained. The aim of the management is to find a replacement policy in order to maximize the companies profit. A decision rule is a function f n : S n A n with f n (i) A n (i) for all i S n. It chooses an

5 admissible action at stage n depending on the state of the system. A decision rule of this type is called Markovian, since the decision depends only on the current state and not on the whole history. Moreover, the actions are chosen with certainty. Because of this, the decision rule is called deterministic. There are cases where it is necessary to work with more general decision rules, which are history dependent and/or randomized. A randomized decision rule specifies in a given state a probability distribution on the set of admissible actions. Thus, a random experiment has to be carried out, in order to determine the action. There are certain problems where only randomized decision rules are optimal (see also Stochastic Games). In our exposition, we will restrict to deterministic and Markovian decision rules. A policy specifies the decision rules that have to be applied at the different stages. It is a sequence of decision rules and denoted by π = (f 0, f 1,..., f 1 ). If policy π = (f 0,..., f 1 ) is given and if the system is in state i n S n, the controller has to select action f n (i n ) D n (i n ) and the system moves to a new state i n+1 with probability p n (i n, f n (i n ), i n+1 ). Formally, the evolution of the system under a given policy π is described by a sequence of random variables X 0, X 1,..., X which forms a (non-stationary) Markov chain with transition probabilities p ( i, i ): = p ( i, f ( i ), i ) (4) fn n n+ 1 n n n n n+ 1 for n = 0, 1,..., 1. A stochastic dynamic program is called stationary, if the state spaces, the action spaces, the set of admissible actions and the transition probabilities are independent of n and the reward functions r n and g have the form n r ( ia) = β r( i, a) and g ( i) = β g( i) (5) n for some factor ß (0, ). In this case, we drop the subscript n and the problem is determined by the objects (S, A, p, r, g, ß). These models play an important role, when infinite horizon decision problems are considered. 3. Finite Horizon Decision Problems In this section, we consider finite horizon Markov decision problems. We explain the important role of the Bellman equation and show how optimal policies can be computed with the backward induction algorithm. Under some assumptions, the monotonicity of optimal policies is studied. Given an initial state and a policy, a random stream of rewards is generated. As usual in the literature, we suppose that the objective is to maximize the expected sum of the rewards. More precisely, let i S 0 be the initial state, π a given policy and the planning horizon. X 0, X 1,..., X is the sequence of random states which are visited. The expected total reward over the planning horizon if policy π is used and the system starts in state i is given by

6 1 vπ () i = E πi rk( Xk, fk,( Xk)) + g( X), (6) k = 0 where Eπ i denotes the expectation with respect to the probability measure which is defined by policy π and initial state i. Of course, the aim is to find a policy which maximizes the expected reward. This maximal expected reward is given by V (): i = sup V (), i i S. (7) π π 0 The function V (i) is called value function or optimal value of the -stage Markov decision process. A policy π is called optimal if V * () i = V () i for all states i S 0. If S n and A n are finite for every stage n, such an optimal policy obviously exists and can be found by enumeration. In general, such an optimal policy does not necessarily exist. In this case, one might be content with an ε-optimal policy. This is a policy that satisfies V π (i) V (i) ε for all states i S Bibliography TO ACCESS ALL THE 23 PAGES OF THIS CHAPTER, Visit: Bellman R.E. (1957). Dynamic Programming, 342 pp. Princeton, J: Princeton University Press. [First textbook on deterministic and stochastic dynamic models.] Bertsekas D.P. (1995). Dynamic Programming and Optimal Control. Belmont, MA: Athena Scientific. [Textbook on deterministic and stochastic dynamic models.] Hernández-Lerma O. and Lasserre J.B. (1996) Discrete-Time Markov Control Processes, 216 pp. ew York: Springer-Verlag. [Recent book on Markov decision processes with general state spaces.] π Howard, R.A. (1960). Dynamic Programming and Markov Processes, 136 pp. Cambridge, MA: MIT Press. [Early textbook on Markov decision processes]. Puterman M. (1994) Markov Decision Processes, 649 pp. ew York: Wiley. [Textbook on Markov decision processes with countable state spaces.] Sennott L.I. (1999) Stochastic Dynamic Programming and the Control of Queues, 328 pp. ew York: John Wiley & Sons. [This book considers Markov decision processes with queueing applications and has a strong computational emphasis.] Biographical Sketch Ulrich Rieder, born in 1945, received the Ph.D. degree in mathematics in 1972 (University of Hamburg) and the Habilitation in 1979 (University of Karlsruhe). Since 1980, he has been Full Professor of Mathematics and head of the Department of Optimization and Operations Research at the University of Ulm. His research interests include the analysis and control of stochastic processes with applications in

7 telecommunication, logistics, applied probability, finance and insurance. Dr. Rieder is Editor-in-Chief of the journal Mathematical Methods of Operations Research.

### Markov Chains: An Introduction/Review

Markov Chains: An Introduction/Review David Sirl dsirl@maths.uq.edu.au http://www.maths.uq.edu.au/ dsirl/ AUSTRALIAN RESEARCH COUNCIL Centre of Excellence for Mathematics and Statistics of Complex Systems

### Reinforcement Learning

Reinforcement Learning LU 2 - Markov Decision Problems and Dynamic Programming Dr. Martin Lauer AG Maschinelles Lernen und Natürlichsprachliche Systeme Albert-Ludwigs-Universität Freiburg martin.lauer@kit.edu

### Informatics 2D Reasoning and Agents Semester 2, 2015-16

Informatics 2D Reasoning and Agents Semester 2, 2015-16 Alex Lascarides alex@inf.ed.ac.uk Lecture 30 Markov Decision Processes 25th March 2016 Informatics UoE Informatics 2D 1 Sequential decision problems

### MARKOV DECISION PROCESSES

July 06, 2010, 15:00 Draft, don t distribute MARKOV DECISION PROCESSES NICOLE BÄUERLE AND ULRICH RIEDER Abstract: The theory of Markov Decision Processes is the theory of controlled Markov chains. Its

### Planning in Artificial Intelligence

Logical Representations and Computational Methods for Marov Decision Processes Craig Boutilier Department of Computer Science University of Toronto Planning in Artificial Intelligence Planning has a long

### Reinforcement Learning

Reinforcement Learning LU 2 - Markov Decision Problems and Dynamic Programming Dr. Joschka Bödecker AG Maschinelles Lernen und Natürlichsprachliche Systeme Albert-Ludwigs-Universität Freiburg jboedeck@informatik.uni-freiburg.de

### Sufficient Conditions for Monotone Value Functions in Multidimensional Markov Decision Processes: The Multiproduct Batch Dispatch Problem

Sufficient Conditions for Monotone Value Functions in Multidimensional Markov Decision Processes: The Multiproduct Batch Dispatch Problem Katerina Papadaki Warren B. Powell October 7, 2005 Abstract Structural

### 6.231 Dynamic Programming Midterm, Fall 2008. Instructions

6.231 Dynamic Programming Midterm, Fall 2008 Instructions The midterm comprises three problems. Problem 1 is worth 60 points, problem 2 is worth 40 points, and problem 3 is worth 40 points. Your grade

### Tetris: Experiments with the LP Approach to Approximate DP

Tetris: Experiments with the LP Approach to Approximate DP Vivek F. Farias Electrical Engineering Stanford University Stanford, CA 94403 vivekf@stanford.edu Benjamin Van Roy Management Science and Engineering

### A Single-Unit Decomposition Approach to Multi-Echelon Inventory Systems

A Single-Unit Decomposition Approach to Multi-Echelon Inventory Systems Alp Muharremoğlu John N. sitsiklis July 200 Revised March 2003 Former version titled: Echelon Base Stock Policies in Uncapacitated

### Supply planning for two-level assembly systems with stochastic component delivery times: trade-off between holding cost and service level

Supply planning for two-level assembly systems with stochastic component delivery times: trade-off between holding cost and service level Faicel Hnaien, Xavier Delorme 2, and Alexandre Dolgui 2 LIMOS,

### Optimal proportional reinsurance and dividend pay-out for insurance companies with switching reserves

Optimal proportional reinsurance and dividend pay-out for insurance companies with switching reserves Abstract: This paper presents a model for an insurance company that controls its risk and dividend

### A Profit-Maximizing Production Lot sizing Decision Model with Stochastic Demand

A Profit-Maximizing Production Lot sizing Decision Model with Stochastic Demand Kizito Paul Mubiru Department of Mechanical and Production Engineering Kyambogo University, Uganda Abstract - Demand uncertainty

### Markov Decision Processes: Making Decision in the Presence of Uncertainty. Decision Processes: General Description

Markov Decision Processes: Making Decision in the Presence of Uncertainty (some of) R&N 6.-6.6 R&N 7.-7.4 Decision Processes: General Description Suppose that you own a business. At any time, you know

### Scheduling Algorithms for Downlink Services in Wireless Networks: A Markov Decision Process Approach

Scheduling Algorithms for Downlink Services in Wireless Networks: A Markov Decision Process Approach William A. Massey ORFE Department Engineering Quadrangle, Princeton University Princeton, NJ 08544 K.

### Coordinated Pricing and Inventory in A System with Minimum and Maximum Production Constraints

The 7th International Symposium on Operations Research and Its Applications (ISORA 08) Lijiang, China, October 31 Novemver 3, 2008 Copyright 2008 ORSC & APORC, pp. 160 165 Coordinated Pricing and Inventory

### 15 Markov Chains: Limiting Probabilities

MARKOV CHAINS: LIMITING PROBABILITIES 67 Markov Chains: Limiting Probabilities Example Assume that the transition matrix is given by 7 2 P = 6 Recall that the n-step transition probabilities are given

### Markov Decision Processes: Lecture Notes for STP 425. Jay Taylor

Markov Decision Processes: Lecture Notes for STP 425 Jay Taylor November 26, 2012 Contents 1 Overview 4 1.1 Sequential Decision Models.............................. 4 1.2 Examples........................................

### A Decomposition Approach for a Capacitated, Single Stage, Production-Inventory System

A Decomposition Approach for a Capacitated, Single Stage, Production-Inventory System Ganesh Janakiraman 1 IOMS-OM Group Stern School of Business New York University 44 W. 4th Street, Room 8-160 New York,

### ADVANTAGE UPDATING. Leemon C. Baird III. Technical Report WL-TR Wright Laboratory. Wright-Patterson Air Force Base, OH

ADVANTAGE UPDATING Leemon C. Baird III Technical Report WL-TR-93-1146 Wright Laboratory Wright-Patterson Air Force Base, OH 45433-7301 Address: WL/AAAT, Bldg 635 2185 Avionics Circle Wright-Patterson Air

### Computing Near Optimal Strategies for Stochastic Investment Planning Problems

Computing Near Optimal Strategies for Stochastic Investment Planning Problems Milos Hauskrecfat 1, Gopal Pandurangan 1,2 and Eli Upfal 1,2 Computer Science Department, Box 1910 Brown University Providence,

### 6.231 Dynamic Programming and Stochastic Control Fall 2008

MIT OpenCourseWare http://ocw.mit.edu 6.231 Dynamic Programming and Stochastic Control Fall 2008 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms. 6.231

### Analysis of a Production/Inventory System with Multiple Retailers

Analysis of a Production/Inventory System with Multiple Retailers Ann M. Noblesse 1, Robert N. Boute 1,2, Marc R. Lambrecht 1, Benny Van Houdt 3 1 Research Center for Operations Management, University

### Chapter 4: Time-Domain Representation for Linear Time-Invariant Systems

Chapter 4: Time-Domain Representation for Linear Time-Invariant Systems In this chapter we will consider several methods for describing relationship between the input and output of linear time-invariant

### Markov Chains, Stochastic Processes, and Advanced Matrix Decomposition

Markov Chains, Stochastic Processes, and Advanced Matrix Decomposition Jack Gilbert Copyright (c) 2014 Jack Gilbert. Permission is granted to copy, distribute and/or modify this document under the terms

### A Single-Unit Decomposition Approach to Multiechelon Inventory Systems

OPERAIONS RESEARCH Vol. 56, No. 5, September October 2008, pp. 089 03 issn 0030-364X eissn 526-5463 08 5605 089 informs doi 0.287/opre.080.0620 2008 INFORMS A Single-Unit Decomposition Approach to Multiechelon

### The SIS Epidemic Model with Markovian Switching

The SIS Epidemic with Markovian Switching Department of Mathematics and Statistics University of Strathclyde Glasgow, G1 1XH (Joint work with A. Gray, D. Greenhalgh and J. Pan) Outline Motivation 1 Motivation

### Lecture 3 APPLICATION OF SIMULATION IN SERVICE OPERATIONS MANAGEMENT

Lecture 3 APPLICATION OF SIMULATION IN SERVICE OPERATIONS MANAGEMENT Learning Objective To discuss application of simulation in services 1. SIMULATION Simulation is a powerful technique for solving a wide

### A simple analysis of the TV game WHO WANTS TO BE A MILLIONAIRE? R

A simple analysis of the TV game WHO WANTS TO BE A MILLIONAIRE? R Federico Perea Justo Puerto MaMaEuSch Management Mathematics for European Schools 94342 - CP - 1-2001 - DE - COMENIUS - C21 University

### Double Sequences and Double Series

Double Sequences and Double Series Eissa D. Habil Islamic University of Gaza P.O. Box 108, Gaza, Palestine E-mail: habil@iugaza.edu Abstract This research considers two traditional important questions,

### TD(0) Leads to Better Policies than Approximate Value Iteration

TD(0) Leads to Better Policies than Approximate Value Iteration Benjamin Van Roy Management Science and Engineering and Electrical Engineering Stanford University Stanford, CA 94305 bvr@stanford.edu Abstract

### 1. (First passage/hitting times/gambler s ruin problem:) Suppose that X has a discrete state space and let i be a fixed state. Let

Copyright c 2009 by Karl Sigman 1 Stopping Times 1.1 Stopping Times: Definition Given a stochastic process X = {X n : n 0}, a random time τ is a discrete random variable on the same probability space as

### COMPLETE MARKETS DO NOT ALLOW FREE CASH FLOW STREAMS

COMPLETE MARKETS DO NOT ALLOW FREE CASH FLOW STREAMS NICOLE BÄUERLE AND STEFANIE GRETHER Abstract. In this short note we prove a conjecture posed in Cui et al. 2012): Dynamic mean-variance problems in

### Economic Effects of Mobile Technologies on Operations of Sales Agents

Economic Effects of Mobile Technologies on Operations of Sales Agents Grigori Pichtchoulov Knut Richter European University Viadrina Frankfurt (Oder) Department of Business Administration and Economics

### Optimum decision-making process for an Economic Order Quantity (EOQ) model with budget and floor space constraints

American Journal of Service Science and Management 2014; 1(5): 53-57 Published online January 20, 2015 (http://www.openscienceonline.com/journal/ajssm) Optimum decision-making process for an Economic Order

### Markov Chains. Chapter Introduction and Definitions 110SOR201(2002)

page 5 SOR() Chapter Markov Chains. Introduction and Definitions Consider a sequence of consecutive times ( or trials or stages): n =,,,... Suppose that at each time a probabilistic experiment is performed,

### Finite Markov Chains and Algorithmic Applications. Matematisk statistik, Chalmers tekniska högskola och Göteborgs universitet

Finite Markov Chains and Algorithmic Applications Olle Häggström Matematisk statistik, Chalmers tekniska högskola och Göteborgs universitet PUBLISHED BY THE PRESS SYNDICATE OF THE UNIVERSITY OF CAMBRIDGE

### Mila Stojaković 1. Finite pure exchange economy, Equilibrium,

Novi Sad J. Math. Vol. 35, No. 1, 2005, 103-112 FUZZY RANDOM VARIABLE IN MATHEMATICAL ECONOMICS Mila Stojaković 1 Abstract. This paper deals with a new concept introducing notion of fuzzy set to one mathematical

### Department of Industrial Engineering

Department of Industrial Engineering Master of Engineering Program in Industrial Engineering (International Program) M.Eng. (Industrial Engineering) Plan A Option 2: Total credits required: minimum 39

### Lecture notes: single-agent dynamics 1

Lecture notes: single-agent dynamics 1 Single-agent dynamic optimization models In these lecture notes we consider specification and estimation of dynamic optimization models. Focus on single-agent models.

### INTEGRATED OPTIMIZATION OF SAFETY STOCK

INTEGRATED OPTIMIZATION OF SAFETY STOCK AND TRANSPORTATION CAPACITY Horst Tempelmeier Department of Production Management University of Cologne Albertus-Magnus-Platz D-50932 Koeln, Germany http://www.spw.uni-koeln.de/

### A Formalization of the Turing Test Evgeny Chutchev

A Formalization of the Turing Test Evgeny Chutchev 1. Introduction The Turing test was described by A. Turing in his paper [4] as follows: An interrogator questions both Turing machine and second participant

### Average Cost Inventory Models: An Analysis Using Schäl s Vanishing Discount Results

Average Cost Inventory Models: An Analysis Using Schäl s Vanishing Discount Results Woonghee Tim Huh, Columbia University Ganesh Janakiraman, New York University Initial Version: May 27, 2005 Revised:

### Stochastic Inventory Control

Chapter 3 Stochastic Inventory Control 1 In this chapter, we consider in much greater details certain dynamic inventory control problems of the type already encountered in section 1.3. In addition to the

### Inventory Control Policy of Preventive Lateral Transshipment between Retailers in Multi Periods

Journal of Industrial Engineering and Management JIEM, 2014 7(3): 681-697 Online ISSN: 2013-0953 Print ISSN: 2013-8423 http://dx.doi.org/10.3926/jiem.1068 Inventory Control Policy of Preventive Lateral

### OPTIMAL DESIGN OF A MULTITIER REWARD SCHEME. Amir Gandomi *, Saeed Zolfaghari **

OPTIMAL DESIGN OF A MULTITIER REWARD SCHEME Amir Gandomi *, Saeed Zolfaghari ** Department of Mechanical and Industrial Engineering, Ryerson University, Toronto, Ontario * Tel.: + 46 979 5000x7702, Email:

### Probabilistic Goal Markov Decision Processes

Probabilistic Goal Markov Decision Processes Huan Xu Department of Mechanical Engineering National University of Singapore, Singapore mpexuh@nus.edu.sg Shie Mannor Department of Electrical Engineering

### Optimal Design of Sequential Real-Time Communication Systems Aditya Mahajan, Member, IEEE, and Demosthenis Teneketzis, Fellow, IEEE

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 55, NO. 11, NOVEMBER 2009 5317 Optimal Design of Sequential Real-Time Communication Systems Aditya Mahajan, Member, IEEE, Demosthenis Teneketzis, Fellow, IEEE

### SPARE PARTS INVENTORY SYSTEMS UNDER AN INCREASING FAILURE RATE DEMAND INTERVAL DISTRIBUTION

SPARE PARS INVENORY SYSEMS UNDER AN INCREASING FAILURE RAE DEMAND INERVAL DISRIBUION Safa Saidane 1, M. Zied Babai 2, M. Salah Aguir 3, Ouajdi Korbaa 4 1 National School of Computer Sciences (unisia),

### Revenue management based hospital appointment scheduling

ISSN 1 746-7233, England, UK World Journal of Modelling and Simulation Vol. 11 (2015 No. 3, pp. 199-207 Revenue management based hospital appointment scheduling Xiaoyang Zhou, Canhui Zhao International

### An Algorithm for Procurement in Supply-Chain Management

An Algorithm for Procurement in Supply-Chain Management Scott Buffett National Research Council Canada Institute for Information Technology - e-business 46 Dineen Drive Fredericton, New Brunswick, Canada

### Mean field equilibria in large scale dynamic games. Ramesh Johari Stanford University

Mean field equilibria in large scale dynamic games Ramesh Johari Stanford University Overview Dynamic systems with many interacting agents are everywhere: Resource sharing (e.g., spectrum sharing) Financial

### A Programme Implementation of Several Inventory Control Algorithms

BULGARIAN ACADEMY OF SCIENCES CYBERNETICS AND INFORMATION TECHNOLOGIES Volume, No Sofia 20 A Programme Implementation of Several Inventory Control Algorithms Vladimir Monov, Tasho Tashev Institute of Information

### Research Article Two-Period Inventory Control with Manufacturing and Remanufacturing under Return Compensation Policy

Discrete Dynamics in Nature and Society Volume 2013, Article ID 871286, 8 pages http://dx.doi.org/10.1155/2013/871286 Research Article Two-Period Inventory Control with Manufacturing and Remanufacturing

### Outsourcing Analysis in Closed-Loop Supply Chains for Hazardous Materials

Outsourcing Analysis in Closed-Loop Supply Chains for Hazardous Materials Víctor Manuel Rayas Carbajal Tecnológico de Monterrey, campus Toluca victor.rayas@invitados.itesm.mx Marco Antonio Serrato García

### MATHEMATICAL ISSUES IN DYNAMIC PROGRAMMING

MATHEMATICAL ISSUES IN DYNAMIC PROGRAMMING by Dimitri P. Bertsekas and Steven E. Shreve 1. INTRODUCTION A general theory of dynamic programming must deal with the formidable mathematical questions that

### Stochastic Models for Inventory Management at Service Facilities

Stochastic Models for Inventory Management at Service Facilities O. Berman, E. Kim Presented by F. Zoghalchi University of Toronto Rotman School of Management Dec, 2012 Agenda 1 Problem description Deterministic

### Optimization under uncertainty: modeling and solution methods

Optimization under uncertainty: modeling and solution methods Paolo Brandimarte Dipartimento di Scienze Matematiche Politecnico di Torino e-mail: paolo.brandimarte@polito.it URL: http://staff.polito.it/paolo.brandimarte

### Probability and Statistics

CHAPTER 2: RANDOM VARIABLES AND ASSOCIATED FUNCTIONS 2b - 0 Probability and Statistics Kristel Van Steen, PhD 2 Montefiore Institute - Systems and Modeling GIGA - Bioinformatics ULg kristel.vansteen@ulg.ac.be

### On spare parts optimization

On spare parts optimization Krister Svanberg Optimization and Systems Theory, KTH, Stockholm, Sweden. krille@math.kth.se This manuscript deals with some mathematical optimization models for multi-level

### STUDIES OF INVENTORY CONTROL AND CAPACITY PLANNING WITH MULTIPLE SOURCES. A Dissertation Presented to The Academic Faculty. Frederick Craig Zahrn

STUDIES OF INVENTORY CONTROL AND CAPACITY PLANNING WITH MULTIPLE SOURCES A Dissertation Presented to The Academic Faculty By Frederick Craig Zahrn In Partial Fulfillment Of the Requirements for the Degree

### Equilibrium investment strategy for defined-contribution pension schemes with generalized mean-variance criterion and mortality risk

Equilibrium investment strategy for defined-contribution pension schemes with generalized mean-variance criterion and mortality risk Huiling Wu, Yan Zeng July 4, 2015 Outline Introduction Problem formulation

### Steel stacking. A problem in inventory management in the steel industry. João Pedro Pedroso

A problem in inventory management in the steel industry International Symposium on Mathematics of Logistics TUMSAT, Japan, November 2011 Part of the presentation concerns joint work with Rui Rei and Mikio

### Mathematical Programming Approach to Dynamic Resource Allocation Problems Ph.D. Thesis Proposal

UNIVERSIDAD CARLOS III DE MADRID DEPARTMENT OF STATISTICS Mathematical Programming Approach to Dynamic Resource Allocation Problems Ph.D. Thesis Proposal Peter Jacko November 18, 2005 UNIVERSIDAD CARLOS

### Convergence of Feller Processes

Chapter 15 Convergence of Feller Processes This chapter looks at the convergence of sequences of Feller processes to a iting process. Section 15.1 lays some ground work concerning weak convergence of processes

### 2014-2015 The Master s Degree with Thesis Course Descriptions in Industrial Engineering

2014-2015 The Master s Degree with Thesis Course Descriptions in Industrial Engineering Compulsory Courses IENG540 Optimization Models and Algorithms In the course important deterministic optimization

### Jeep problem and its 2D extension the solution with an application of the reinforcement learning

Jeep problem and its 2D extension the solution with an application of the reinforcement learning Marcin Pluciński Faculty of Computer Science and Information Technology, Szczecin University of Technology,

### Deterministic Problems

Chapter 2 Deterministic Problems 1 In this chapter, we focus on deterministic control problems (with perfect information), i.e, there is no disturbance w k in the dynamics equation. For such problems there

### The Lecture Contains: Application of stochastic processes in areas like manufacturing. Product(s)/Good(s) to be produced. Decision variables

The Lecture Contains: Application of stochastic processes in areas like manufacturing Product(s)/Good(s) to be produced Decision variables Structure of decision problem Demand Ordering/Production Cost

### Chapter 1. Introduction. 1.1 Motivation and Outline

Chapter 1 Introduction 1.1 Motivation and Outline In a global competitive market, companies are always trying to improve their profitability. A tool which has proven successful in order to achieve this

### Periodic Capacity Management under a Lead Time Performance Constraint

Periodic Capacity Management under a Lead Time Performance Constraint N.C. Buyukkaramikli 1,2 J.W.M. Bertrand 1 H.P.G. van Ooijen 1 1- TU/e IE&IS 2- EURANDOM INTRODUCTION Using Lead time to attract customers

### STRATEGIC CAPACITY PLANNING USING STOCK CONTROL MODEL

Session 6. Applications of Mathematical Methods to Logistics and Business Proceedings of the 9th International Conference Reliability and Statistics in Transportation and Communication (RelStat 09), 21

### ALGORITHMIC TRADING WITH MARKOV CHAINS

June 16, 2010 ALGORITHMIC TRADING WITH MARKOV CHAINS HENRIK HULT AND JONAS KIESSLING Abstract. An order book consists of a list of all buy and sell offers, represented by price and quantity, available

### Cardinality. The set of all finite strings over the alphabet of lowercase letters is countable. The set of real numbers R is an uncountable set.

Section 2.5 Cardinality (another) Definition: The cardinality of a set A is equal to the cardinality of a set B, denoted A = B, if and only if there is a bijection from A to B. If there is an injection

### The Analysis of Dynamical Queueing Systems (Background)

The Analysis of Dynamical Queueing Systems (Background) Technological innovations are creating new types of communication systems. During the 20 th century, we saw the evolution of electronic communication

### Abstract. 1. Introduction. Caparica, Portugal b CEG, IST-UTL, Av. Rovisco Pais, 1049-001 Lisboa, Portugal

Ian David Lockhart Bogle and Michael Fairweather (Editors), Proceedings of the 22nd European Symposium on Computer Aided Process Engineering, 17-20 June 2012, London. 2012 Elsevier B.V. All rights reserved.

### Effective Distribution Policies Utilizing Logistics Contracting

Effective Distribution Policies Utilizing Logistics Contracting Hyun-Soo Ahn Osman Engin Alper Philip Kaminsky Stephen M. Ross School of Business, University of Michigan, 701 Tappan Street, Ann Arbor,

### Revenue Management for Transportation Problems

Revenue Management for Transportation Problems Francesca Guerriero Giovanna Miglionico Filomena Olivito Department of Electronic Informatics and Systems, University of Calabria Via P. Bucci, 87036 Rende

### Introduction. Chapter 1

Chapter 1 Introduction The success of Japanese companies in the second half of the 20th century has lead to an increased interest in inventory management. Typically, these companies operated with far less

### Chapter 4. SDP - difficulties. 4.1 Curse of dimensionality

Chapter 4 SDP - difficulties So far we have discussed simple problems. Not necessarily in a conceptual framework, but surely in a computational. It makes little sense to discuss SDP, or DP for that matter,

### 6. Metric spaces. In this section we review the basic facts about metric spaces. d : X X [0, )

6. Metric spaces In this section we review the basic facts about metric spaces. Definitions. A metric on a non-empty set X is a map with the following properties: d : X X [0, ) (i) If x, y X are points

### Approximation Algorithms for Stochastic Inventory Control Models

Approximation Algorithms for Stochastic Inventory Control Models (Abstract) Retsef Levi Martin Pál Robin O. Roundy David B. Shmoys School of ORIE, Cornell University, Ithaca, NY 14853, USA DIMACS Center,

### OPTIMAL CONTROL OF FLEXIBLE SERVERS IN TWO TANDEM QUEUES WITH OPERATING COSTS

Probability in the Engineering and Informational Sciences, 22, 2008, 107 131. Printed in the U.S.A. DOI: 10.1017/S0269964808000077 OPTIMAL CONTROL OF FLEXILE SERVERS IN TWO TANDEM QUEUES WITH OPERATING

### Random access protocols for channel access. Markov chains and their stability. Laurent Massoulié.

Random access protocols for channel access Markov chains and their stability laurent.massoulie@inria.fr Aloha: the first random access protocol for channel access [Abramson, Hawaii 70] Goal: allow machines

### Forecasting Hospital Bed Availability Using Simulation and Neural Networks

Forecasting Hospital Bed Availability Using Simulation and Neural Networks Matthew J. Daniels Michael E. Kuhl Industrial & Systems Engineering Department Rochester Institute of Technology Rochester, NY

### CHAPTER III - MARKOV CHAINS

CHAPTER III - MARKOV CHAINS JOSEPH G. CONLON 1. General Theory of Markov Chains We have already discussed the standard random walk on the integers Z. A Markov Chain can be viewed as a generalization of

### A Simultaneous Deterministic Perturbation Actor-Critic Algorithm with an Application to Optimal Mortgage Refinancing

Proceedings of the 45th IEEE Conference on Decision & Control Manchester Grand Hyatt Hotel San Diego, CA, USA, December 13-15, 2006 A Simultaneous Deterministic Perturbation Actor-Critic Algorithm with

### Compensation of high interest rates by tax holidays: A real options approach

Compensation of high interest rates by tax holidays: A real options approach Vadim Arkin and Alexander Slastnikov There is an important problem how to attract investments to the real sector of economics

### Chapter 1 Introduction to Simulation. Banks, Carson, Nelson & Nicol Discrete-Event System Simulation

Chapter 1 Introduction to Simulation Banks, Carson, Nelson & Nicol Discrete-Event System Simulation Outline When Simulation Is the Appropriate Tool When Simulation Is Not Appropriate Advantages and Disadvantages

### Perceptron Learning Algorithm

Perceptron Learning Algorithm Department of Statistics The Pennsylvania State University Email: jiali@stat.psu.edu Separating Hyperplanes Construct linear decision boundaries that explicitly try to separate

### Warehouse Optimization in Uncertain Environment

Warehouse Optimization in Uncertain Environment Miroljub Kljaji, Davorin Kofja, Andrej Škraba, Valter Rejec 2 University of Maribor, Faculty of Organizational Sciences Cybernetics & DSS Laboratory Kidrieva

### Neuro-Dynamic Programming An Overview

1 Neuro-Dynamic Programming An Overview Dimitri Bertsekas Dept. of Electrical Engineering and Computer Science M.I.T. September 2006 2 BELLMAN AND THE DUAL CURSES Dynamic Programming (DP) is very broadly

### x if x 0, x if x < 0.

Chapter 3 Sequences In this chapter, we discuss sequences. We say what it means for a sequence to converge, and define the limit of a convergent sequence. We begin with some preliminary results about the

### Section 3 Sequences and Limits

Section 3 Sequences and Limits Definition A sequence of real numbers is an infinite ordered list a, a 2, a 3, a 4,... where, for each n N, a n is a real number. We call a n the n-th term of the sequence.

### Single item inventory control under periodic review and a minimum order quantity

Single item inventory control under periodic review and a minimum order quantity G. P. Kiesmüller, A.G. de Kok, S. Dabia Faculty of Technology Management, Technische Universiteit Eindhoven, P.O. Box 513,

### MATH 56A SPRING 2008 STOCHASTIC PROCESSES 31

MATH 56A SPRING 2008 STOCHASTIC PROCESSES 3.3. Invariant probability distribution. Definition.4. A probability distribution is a function π : S [0, ] from the set of states S to the closed unit interval

### S = {1, 2,..., n}. P (1, 1) P (1, 2)... P (1, n) P (2, 1) P (2, 2)... P (2, n) P = . P (n, 1) P (n, 2)... P (n, n)

last revised: 26 January 2009 1 Markov Chains A Markov chain process is a simple type of stochastic process with many social science applications. We ll start with an abstract description before moving

### Measuring the Performance of an Agent

25 Measuring the Performance of an Agent The rational agent that we are aiming at should be successful in the task it is performing To assess the success we need to have a performance measure What is rational