Reinforcement Learning

Size: px
Start display at page:

Download "Reinforcement Learning"

Transcription

1 Reinforcement Learning Solving MDPs March April, 2015

2 Brute Force Solving an MDP means finding an optimal policy A naive approach consists of enumerating all the deterministic Markov policies evaluate each policy return the best one The number of policies is exponential: A S Need a more intelligent search for best policies restrict the search to a subset of the possible policies using stochastic optimization algorithms

3 What is? : sequential or temporal component to the problem : optimizing a program, i.e., a policy c.f. linear programming A method for solving complex problems By breaking them down into subproblems Solve the subproblems Combine solutions to subproblems

4 Requirements for is a very general solution method for problems which have two properties: Optimal substructure Principle of optimality applies Optimal solution can be decomposed into subproblems Overlapping subproblems Subproblems recur many times Solutions can be cached and reused Markov decision processes satisfy both properties Bellman equation gives revursive decomposition Value function stores and reuses solutions

5 Planning by assumes full knowledge of the MDP It is used for planning in an MDP Prediction Input: MDP S, A, P, R, γ, µ and policy π (i.e., MRP S, P π, R π, γ, µ ) Output: value function V π Control Input: MDP S, A, P, R, γ, µ Output: value function V and optimal policy π

6 Other Applications of is used to solve many other problems: Scheduling algorithms String algorithms (e.g., sequence alignment) Graph algorithms (e.g., shortest path algorithms) Graphical models (e.g., Viterbi algorithm) Bioinformatics (e.g., lattice models)

7 Finite Horizon Principle of optimality: the tail of an optimal policy is optimal for the tail problem Backward induction Backward recursion V k (s) = max a A k R k (s, a) + P k (s s, a)v k+1 (s ) s, k = N 1,..., 0 S k+1 Optimal policy π k (s) arg max a A k R k (s, a) + P k (s s, a)v k+1 (s ) s, k = 0,..., N 1 S k+1 Cost: N S A vs A N S of brute force policy search From now on, we will consider infinite horizon discounted MDPs

8 Policy Evaluation For a given policy π compute the state value function V π Recall State value function for policy π: { } V π (s) = E γ t r t s 0 = s Bellman equation for V π : [ V π (s) = a A π(a s) t=0 R(s, a) + γ s S P(s s, a)v π (s ) ] A system of S simultaneous linear equations Solution in matrix notation (complexity O(n 3 )): V π = (I γp π ) 1 R π

9 Iterative Policy Evaluation Iterative application of Bellman expectation backup V 0 V 1 V k V k+1 V π A full policy evaluation backup: [ V k+1 (s) a A π(a s) R(s, a) + γ s S P(s s, a)v k (s ) A sweep consists of applying a backup operation to each state Using synchronous backups At each iteration k + 1 For all states s S Update V k+1 (s) from V k (s ) ]

10 Example Small Gridworld Undiscounted episodic MDP γ = 1 All episodes terminate in absorbing terminal state Transient states 1,..., 14 One terminal state (shown twice as shaded squares) Actions that would take agent off the grid leave state unchanged Reward is 1 until the terminal state is reached

11 Policy Evaluation in Small Gridworld

12 Policy Evaluation in Small Gridworld

13 Policy Improvement Consider a deterministic policy π For a given state s, would it better to do an action a π(s)? We can improve the policy by acting greedily π (s) = arg max a A Qπ (s, a) This improves the value from any state s over one step Q π (s, π (s)) = max a A Qπ (s, a) Q π (s, π(s)) = V π (s)

14 Policy Improvement Theorem Theorem Let π and π be any pair of deterministic policies such that Q π (s, π (s)) V π (s), s S Then the policy π must be as good as, or better than π V π (s) V π (s), s S Proof. V π (s) Q π (s, π (s)) = E π [r t+1 + γv π (s t+1 ) s t = s] [ E π rt+1 + γq π (s t+1, π (s t+1 )) s t = s ] [ ] E π r t+1 + γr t+2 + γ 2 Q π (s t+2, π (s t+2 )) s t = s E π [r t+1 + γr t s t = s] = V π (s)

15 What if improvements stops (V π = V π )? Q π (s, π (s)) = max a A Qπ (s, a) = Q π (s, π(s)) = V π (s) But this is the Bellman optimality equation Therefore V π (s) = V π (s) = V (s) for all s S So π is an optimal policy! π 0 V π 0 π 1 V π 1 π V π

16 Example of Jack s Car Rental States: Two locations, maximum of 20 cars each Actions: Move up to 5 cars between two locations overnight Reward: $10 for each car rented (must be available) Transitions: Cars returned and requested randomly Poisson distribution, n returns/request with probability λ n n! e λ First location: average requests = 3, average returns = 3 Second location: average requests = 4, average returns = 2

17 Example of Jack s Car Rental

18 Modified Does policy evaluation need to converge to V π? Or should we introduce a stopping condition e.g., ɛ convergence of value function Or simply stop after k iterations of iterative policy evaluation? For example, in the small gridworld k = 3 was sufficient to achieve optimal policy Why not update policy every iteration? i.e. stop after k = 1

19 Generalized Policy evaluation: Estimate V π e.g., Iterative policy evaluation Policy improvement: Generate π π e.g., Greedy policy improvement

20 Principle of Optimality Determinisitic Shortest Path Problem: reach goal state g from start state s 1 with min cost Each action a from state s leads to: Deterministic transition P(s s, a) = 1 iff s = succ(s, a) Reward R(s, a) negative (is a cost) Undiscounted γ = 1 Theorem A path s 1, a 1, s 2, a 2,..., s T is optimal if and only if: For any intermediate state s t along the solution path s t, a t,..., s T is an optimal path from s t to g

21 Principle of Optimality Specifically, consider subdividing the path after one step Then an optimal path from s consists of: An optimal first action a Followed by an optimal path from s = succ(s, a ) Therefore the value of an optimal path must satisfy: V (s) = max a A R(s, a) + V (s )

22 Determinisitic If we know the solution to subproblems V (s ) Then it is easy to construct the solution to V (s) V (s) max a A R(s, a) + V (s ) The idea of value iteration is to apply these updates iteratively e.g., Starting from the goal and working backward

23 in MDPs Many MDPs don t have a finite horizon They are tipically loopy So there is no end to work backwards from However, we can still propagate information backwards Using Bellman optimality equation to backup V (s) from V (s ) Each subproblem is easier due to discount factor γ Iterate until convergence

24 Principle of Optimality in MDPs Theorem A policy π(a s) achieves the optimal value from state s, V π (s) = V (s), if and only if For any state s reachable from s π achieves the optimal value from state s, V π (s ) = V (s ) So an optimal policy π (a s) must consist of: an optimal first action a followed by an optimal policy from successor state s V (s) = max R(s, a) + γ P(s s, a)v (s ) a A s S

25 Problem: find optimal policy π Solution: iterative application of Bellman optimality backup V 1 V 2 V Using synchronous backups At each iteration k + 1 For all states s S Update V k+1 (s) from V k (s ) Unlike policy iteration there is no explicit policy Intermediate value functions may not correspond to any policy demo: poole/demos/mdp/vi.html

26 Convergence and Contractions Define the max norm: V = max s V (s) Theorem converges to the optimal state-value function lim k V k = V Proof. V k+1 V = T V k T V γ V k V γ k+1 V 0 V Theorem V i+1 V i < ɛ V i+1 V < 2ɛγ 1 γ

27 Synchronous Algorithms Problem Bellman Equation Algorithm Prediction Bellman Expectation Equation Policy Evaluation (Iterative) Control Bellman Expectation + Greedy Policy Improvement Control Bellman Optimality Equation Algorithms are based on state value function V π (s) or V (s) Complexity O(mn 2 ) per iteration, for m actions and n states Could also apply to action value function Q π (s, a) or Q (s, a) Complexity O(m 2 n 2 ) per iteration

28 Efficiency of DP To find optimal policy is polynomial in the number of states... but, the number of states is often astronomical, e.g., often growing exponentially with the number of state variables: curse of dimensionality In practice, classical DP can be applied to problems with a few millions states Asynchronous DP can be applied to larger problems, and appropriate for parallel computation It is surprisingly easy to come up with MDPs for which methods are not practical

29 Complexity of DP DP methods are polynomial time algorithms for fixed discounted MDPs : O( S 2 A ) for each iteration : Cost of policy evaluation + Cost of policy iteration Policy evaluation: system of equations: O ( S 3) or O ( S ( ) 2.373) Iterative: O S 2 log( ɛ 1 ) log( γ 1 ) Policy ( improvement: ( )) recently proven to be O A 1 γ log S 1 γ Each iteration of PI is computationally more expensive than each iteration of VI PI typically requires fewer iterations to converge than VI Exponentially faster than any direct policy search Number of states often grows exponentially with the number of state variables

30 Asynchronous DP methods described so far used synchronous backups i.e., all state are backed up in parallel Asynchronous DP backs up states individually, in any order For each selected state, apply the appropriate backup Can significantly reduce computation Guaranteed to converge if all states continue to be selected Three ideas for asynchronous DP: In place DP Prioritized sweeping Real time DP

31 In place Synchronous value iteration stores two copies of value function for all s S ( V new (s) max a A R(s, a) + γ s S P(s s, a)v old (s ) V old V new In place value iteration only stores one copy of value function for all s S ( ) V (s) max a A R(s, a) + γ s S P(s s, a)v (s ) )

32 Prioritized Sweeping Use the magnitude of Bellman error to guide state selection, e.g., ( max R(s, a) + γ ) P(s s, a)v k (s ) V (s) a A s S Backup the state with the largest remaining Bellman error Update Bellman error of affected states after each backup Requires knowledge of reverse dynamics (predecessor states) Can be implemented efficiently by maintaining a priority queue

33 Real Time Idea: only states that are relevant to agent Use agent s experience to guide the selection of states After each time step s t, a t, r t+1 ( a t arg max a A Backup the state s t V (s t) max a A ( R(s t, a) + γ s S P(s s t, a)v (s ) R(s t, a) + γ s S P(s s t, a)v (s ) ) ) Theorem If V 0 V then t such that a t are optimal for all t t (where t < with probability 1)

34 Full Width Backups programming uses full width backups For each backup (synchronous or asynchronous) Every successor state and action is considered Using knowledge of the MDP transitions and reward function programming is effective for medium size problems (millions of states) For large problems dynamic programming suffers Bellman s curse of dimensionality Number of states n = S grows exponentially with number of states variables Even one backup can be too expensive

35 Sample Backups Reinforcement Learning techniques exploit sample backups Sample backups do not use reward function R and transition dynamics P Uses sample rewards and sample transitions s, a, s, r Advantages Model free: no prior knowledge of MDP required Breaks the curse of dimensionality through sampling Cost of backups is constant, independent of n = S

36 Approximate Approximate the value function Using a function approximator V θ (s) = f (s, θ) Apply dynamic programming to V θ e.g., Fitted repeats at each iteration k Sample states S S For each state s S, estimate target value using Bellman optimality equation ( ) Ṽ k (s) = max a A R(s, a) + γ s S P(s s, a)v θ k (s ) Train next value function V θ k+1 using targets { s, Ṽk(s) }

37 Infinite Horizon Recall, at value iteration convergence we have { s S : V (s) = max a A LP formulation to find V : R(s, a) + γ s S P(s s, a)v (s ) min V s S µ(s)v (s) s. t. V (s) R(s, a) + s S P(s s, a)v (s ), s S, a A } S variables S A constraints Theorem V is the solution of the above LP.

38 Theorem Proof Let T be the optimal Bellman operator, then the LP can be written as: min V µ T V s. t. V T (V ) Monotonicity property: if U V then T (U) T (V ). Hence, if V T (V ) then T (V ) T (T (V )), and by repeated application, V T (V ) T 2 (V ) T 3 (V ) T (V ) = V Any feasible solution to the LP must satisfy V T (V ), and hence must satisfy V V Hence, assuming all entries µ are positive, V is the optimal solution to the LP

39 Dual Program max λ s. t. λ(s, a)r(s, a) s S a A λ(s, a)p(s s, a), λ(s, a ) = µ(s) + γ a A s S λ(s, a) 0, s S, a A a A s S Interpretation λ(s, a) = t=0 γt P(s t = s, a t = a) Equation 2: ensures λ has the above meaning Equation 1: maximize expected discounted sum of rewards Optimal policy: π (s) = arg max a λ(s, a)

40 Complexity of LP LP worst case convergence guarantees are better than those of DP methods LP methods become impractical at a much smaller number of states than DP methods do

Reinforcement Learning

Reinforcement Learning Reinforcement Learning LU 2 - Markov Decision Problems and Dynamic Programming Dr. Martin Lauer AG Maschinelles Lernen und Natürlichsprachliche Systeme Albert-Ludwigs-Universität Freiburg martin.lauer@kit.edu

More information

Reinforcement Learning

Reinforcement Learning Reinforcement Learning LU 2 - Markov Decision Problems and Dynamic Programming Dr. Joschka Bödecker AG Maschinelles Lernen und Natürlichsprachliche Systeme Albert-Ludwigs-Universität Freiburg jboedeck@informatik.uni-freiburg.de

More information

An Introduction to Markov Decision Processes. MDP Tutorial - 1

An Introduction to Markov Decision Processes. MDP Tutorial - 1 An Introduction to Markov Decision Processes Bob Givan Purdue University Ron Parr Duke University MDP Tutorial - 1 Outline Markov Decision Processes defined (Bob) Objective functions Policies Finding Optimal

More information

Preliminaries: Problem Definition Agent model, POMDP, Bayesian RL

Preliminaries: Problem Definition Agent model, POMDP, Bayesian RL POMDP Tutorial Preliminaries: Problem Definition Agent model, POMDP, Bayesian RL Observation Belief ACTOR Transition Dynamics WORLD b Policy π Action Markov Decision Process -X: set of states [x s,x r

More information

Approximation Algorithms

Approximation Algorithms Approximation Algorithms or: How I Learned to Stop Worrying and Deal with NP-Completeness Ong Jit Sheng, Jonathan (A0073924B) March, 2012 Overview Key Results (I) General techniques: Greedy algorithms

More information

Lecture 5: Model-Free Control

Lecture 5: Model-Free Control Lecture 5: Model-Free Control David Silver Outline 1 Introduction 2 On-Policy Monte-Carlo Control 3 On-Policy Temporal-Difference Learning 4 Off-Policy Learning 5 Summary Introduction Model-Free Reinforcement

More information

5 INTEGER LINEAR PROGRAMMING (ILP) E. Amaldi Fondamenti di R.O. Politecnico di Milano 1

5 INTEGER LINEAR PROGRAMMING (ILP) E. Amaldi Fondamenti di R.O. Politecnico di Milano 1 5 INTEGER LINEAR PROGRAMMING (ILP) E. Amaldi Fondamenti di R.O. Politecnico di Milano 1 General Integer Linear Program: (ILP) min c T x Ax b x 0 integer Assumption: A, b integer The integrality condition

More information

VENDOR MANAGED INVENTORY

VENDOR MANAGED INVENTORY VENDOR MANAGED INVENTORY Martin Savelsbergh School of Industrial and Systems Engineering Georgia Institute of Technology Joint work with Ann Campbell, Anton Kleywegt, and Vijay Nori Distribution Systems:

More information

Dynamic programming. Doctoral course Optimization on graphs - Lecture 4.1. Giovanni Righini. January 17 th, 2013

Dynamic programming. Doctoral course Optimization on graphs - Lecture 4.1. Giovanni Righini. January 17 th, 2013 Dynamic programming Doctoral course Optimization on graphs - Lecture.1 Giovanni Righini January 1 th, 201 Implicit enumeration Combinatorial optimization problems are in general NP-hard and we usually

More information

Motivation. Motivation. Can a software agent learn to play Backgammon by itself? Machine Learning. Reinforcement Learning

Motivation. Motivation. Can a software agent learn to play Backgammon by itself? Machine Learning. Reinforcement Learning Motivation Machine Learning Can a software agent learn to play Backgammon by itself? Reinforcement Learning Prof. Dr. Martin Riedmiller AG Maschinelles Lernen und Natürlichsprachliche Systeme Institut

More information

Linear Programming. Widget Factory Example. Linear Programming: Standard Form. Widget Factory Example: Continued.

Linear Programming. Widget Factory Example. Linear Programming: Standard Form. Widget Factory Example: Continued. Linear Programming Widget Factory Example Learning Goals. Introduce Linear Programming Problems. Widget Example, Graphical Solution. Basic Theory:, Vertices, Existence of Solutions. Equivalent formulations.

More information

An Improved Dynamic Programming Decomposition Approach for Network Revenue Management

An Improved Dynamic Programming Decomposition Approach for Network Revenue Management An Improved Dynamic Programming Decomposition Approach for Network Revenue Management Dan Zhang Leeds School of Business University of Colorado at Boulder May 21, 2012 Outline Background Network revenue

More information

Course: Model, Learning, and Inference: Lecture 5

Course: Model, Learning, and Inference: Lecture 5 Course: Model, Learning, and Inference: Lecture 5 Alan Yuille Department of Statistics, UCLA Los Angeles, CA 90095 yuille@stat.ucla.edu Abstract Probability distributions on structured representation.

More information

6.231 Dynamic Programming Midterm, Fall 2008. Instructions

6.231 Dynamic Programming Midterm, Fall 2008. Instructions 6.231 Dynamic Programming Midterm, Fall 2008 Instructions The midterm comprises three problems. Problem 1 is worth 60 points, problem 2 is worth 40 points, and problem 3 is worth 40 points. Your grade

More information

Call Admission Control and Routing in Integrated Service Networks Using Reinforcement Learning

Call Admission Control and Routing in Integrated Service Networks Using Reinforcement Learning Call Admission Control and Routing in Integrated Service Networks Using Reinforcement Learning Peter Marbach LIDS MIT, Room 5-07 Cambridge, MA, 09 email: marbach@mit.edu Oliver Mihatsch Siemens AG Corporate

More information

Chapter 11. 11.1 Load Balancing. Approximation Algorithms. Load Balancing. Load Balancing on 2 Machines. Load Balancing: Greedy Scheduling

Chapter 11. 11.1 Load Balancing. Approximation Algorithms. Load Balancing. Load Balancing on 2 Machines. Load Balancing: Greedy Scheduling Approximation Algorithms Chapter Approximation Algorithms Q. Suppose I need to solve an NP-hard problem. What should I do? A. Theory says you're unlikely to find a poly-time algorithm. Must sacrifice one

More information

Using Markov Decision Processes to Solve a Portfolio Allocation Problem

Using Markov Decision Processes to Solve a Portfolio Allocation Problem Using Markov Decision Processes to Solve a Portfolio Allocation Problem Daniel Bookstaber April 26, 2005 Contents 1 Introduction 3 2 Defining the Model 4 2.1 The Stochastic Model for a Single Asset.........................

More information

Measuring the Performance of an Agent

Measuring the Performance of an Agent 25 Measuring the Performance of an Agent The rational agent that we are aiming at should be successful in the task it is performing To assess the success we need to have a performance measure What is rational

More information

! Solve problem to optimality. ! Solve problem in poly-time. ! Solve arbitrary instances of the problem. #-approximation algorithm.

! Solve problem to optimality. ! Solve problem in poly-time. ! Solve arbitrary instances of the problem. #-approximation algorithm. Approximation Algorithms 11 Approximation Algorithms Q Suppose I need to solve an NP-hard problem What should I do? A Theory says you're unlikely to find a poly-time algorithm Must sacrifice one of three

More information

Minimum cost maximum flow, Minimum cost circulation, Cost/Capacity scaling

Minimum cost maximum flow, Minimum cost circulation, Cost/Capacity scaling 6.854 Advanced Algorithms Lecture 16: 10/11/2006 Lecturer: David Karger Scribe: Kermin Fleming and Chris Crutchfield, based on notes by Wendy Chu and Tudor Leu Minimum cost maximum flow, Minimum cost circulation,

More information

TD(0) Leads to Better Policies than Approximate Value Iteration

TD(0) Leads to Better Policies than Approximate Value Iteration TD(0) Leads to Better Policies than Approximate Value Iteration Benjamin Van Roy Management Science and Engineering and Electrical Engineering Stanford University Stanford, CA 94305 bvr@stanford.edu Abstract

More information

MapReduce and Distributed Data Analysis. Sergei Vassilvitskii Google Research

MapReduce and Distributed Data Analysis. Sergei Vassilvitskii Google Research MapReduce and Distributed Data Analysis Google Research 1 Dealing With Massive Data 2 2 Dealing With Massive Data Polynomial Memory Sublinear RAM Sketches External Memory Property Testing 3 3 Dealing With

More information

Scheduling Home Health Care with Separating Benders Cuts in Decision Diagrams

Scheduling Home Health Care with Separating Benders Cuts in Decision Diagrams Scheduling Home Health Care with Separating Benders Cuts in Decision Diagrams André Ciré University of Toronto John Hooker Carnegie Mellon University INFORMS 2014 Home Health Care Home health care delivery

More information

Offline sorting buffers on Line

Offline sorting buffers on Line Offline sorting buffers on Line Rohit Khandekar 1 and Vinayaka Pandit 2 1 University of Waterloo, ON, Canada. email: rkhandekar@gmail.com 2 IBM India Research Lab, New Delhi. email: pvinayak@in.ibm.com

More information

Optimization under uncertainty: modeling and solution methods

Optimization under uncertainty: modeling and solution methods Optimization under uncertainty: modeling and solution methods Paolo Brandimarte Dipartimento di Scienze Matematiche Politecnico di Torino e-mail: paolo.brandimarte@polito.it URL: http://staff.polito.it/paolo.brandimarte

More information

Eligibility Traces. Suggested reading: Contents: Chapter 7 in R. S. Sutton, A. G. Barto: Reinforcement Learning: An Introduction MIT Press, 1998.

Eligibility Traces. Suggested reading: Contents: Chapter 7 in R. S. Sutton, A. G. Barto: Reinforcement Learning: An Introduction MIT Press, 1998. Eligibility Traces 0 Eligibility Traces Suggested reading: Chapter 7 in R. S. Sutton, A. G. Barto: Reinforcement Learning: An Introduction MIT Press, 1998. Eligibility Traces Eligibility Traces 1 Contents:

More information

! Solve problem to optimality. ! Solve problem in poly-time. ! Solve arbitrary instances of the problem. !-approximation algorithm.

! Solve problem to optimality. ! Solve problem in poly-time. ! Solve arbitrary instances of the problem. !-approximation algorithm. Approximation Algorithms Chapter Approximation Algorithms Q Suppose I need to solve an NP-hard problem What should I do? A Theory says you're unlikely to find a poly-time algorithm Must sacrifice one of

More information

17.3.1 Follow the Perturbed Leader

17.3.1 Follow the Perturbed Leader CS787: Advanced Algorithms Topic: Online Learning Presenters: David He, Chris Hopman 17.3.1 Follow the Perturbed Leader 17.3.1.1 Prediction Problem Recall the prediction problem that we discussed in class.

More information

9th Max-Planck Advanced Course on the Foundations of Computer Science (ADFOCS) Primal-Dual Algorithms for Online Optimization: Lecture 1

9th Max-Planck Advanced Course on the Foundations of Computer Science (ADFOCS) Primal-Dual Algorithms for Online Optimization: Lecture 1 9th Max-Planck Advanced Course on the Foundations of Computer Science (ADFOCS) Primal-Dual Algorithms for Online Optimization: Lecture 1 Seffi Naor Computer Science Dept. Technion Haifa, Israel Introduction

More information

Fleet Management. Warren B. Powell and Huseyin Topaloglu. June, 2002

Fleet Management. Warren B. Powell and Huseyin Topaloglu. June, 2002 Fleet Management Warren B. Powell and Huseyin Topaloglu June, 2002 Department of Operations Research and Financial Engineering, Princeton University, Princeton, NJ 08544 1 1. Introduction The fleet management

More information

Fleet Management. Warren B. Powell and Huseyin Topaloglu. June, 2002

Fleet Management. Warren B. Powell and Huseyin Topaloglu. June, 2002 Fleet Management Warren B. Powell and Huseyin Topaloglu June, 2002 Department of Operations Research and Financial Engineering, Princeton University, Princeton, NJ 08544 Abstract Fleet management addresses

More information

Stochastic Models for Inventory Management at Service Facilities

Stochastic Models for Inventory Management at Service Facilities Stochastic Models for Inventory Management at Service Facilities O. Berman, E. Kim Presented by F. Zoghalchi University of Toronto Rotman School of Management Dec, 2012 Agenda 1 Problem description Deterministic

More information

5.1 Bipartite Matching

5.1 Bipartite Matching CS787: Advanced Algorithms Lecture 5: Applications of Network Flow In the last lecture, we looked at the problem of finding the maximum flow in a graph, and how it can be efficiently solved using the Ford-Fulkerson

More information

Random access protocols for channel access. Markov chains and their stability. Laurent Massoulié.

Random access protocols for channel access. Markov chains and their stability. Laurent Massoulié. Random access protocols for channel access Markov chains and their stability laurent.massoulie@inria.fr Aloha: the first random access protocol for channel access [Abramson, Hawaii 70] Goal: allow machines

More information

SIMS 255 Foundations of Software Design. Complexity and NP-completeness

SIMS 255 Foundations of Software Design. Complexity and NP-completeness SIMS 255 Foundations of Software Design Complexity and NP-completeness Matt Welsh November 29, 2001 mdw@cs.berkeley.edu 1 Outline Complexity of algorithms Space and time complexity ``Big O'' notation Complexity

More information

Chi-square Tests Driven Method for Learning the Structure of Factored MDPs

Chi-square Tests Driven Method for Learning the Structure of Factored MDPs Chi-square Tests Driven Method for Learning the Structure of Factored MDPs Thomas Degris Thomas.Degris@lip6.fr Olivier Sigaud Olivier.Sigaud@lip6.fr Pierre-Henri Wuillemin Pierre-Henri.Wuillemin@lip6.fr

More information

Discrete Optimization

Discrete Optimization Discrete Optimization [Chen, Batson, Dang: Applied integer Programming] Chapter 3 and 4.1-4.3 by Johan Högdahl and Victoria Svedberg Seminar 2, 2015-03-31 Todays presentation Chapter 3 Transforms using

More information

Neuro-Dynamic Programming An Overview

Neuro-Dynamic Programming An Overview 1 Neuro-Dynamic Programming An Overview Dimitri Bertsekas Dept. of Electrical Engineering and Computer Science M.I.T. September 2006 2 BELLMAN AND THE DUAL CURSES Dynamic Programming (DP) is very broadly

More information

Automata-based Verification - I

Automata-based Verification - I CS3172: Advanced Algorithms Automata-based Verification - I Howard Barringer Room KB2.20: email: howard.barringer@manchester.ac.uk March 2006 Supporting and Background Material Copies of key slides (already

More information

Analysis of Algorithms, I

Analysis of Algorithms, I Analysis of Algorithms, I CSOR W4231.002 Eleni Drinea Computer Science Department Columbia University Thursday, February 26, 2015 Outline 1 Recap 2 Representing graphs 3 Breadth-first search (BFS) 4 Applications

More information

Notes from Week 1: Algorithms for sequential prediction

Notes from Week 1: Algorithms for sequential prediction CS 683 Learning, Games, and Electronic Markets Spring 2007 Notes from Week 1: Algorithms for sequential prediction Instructor: Robert Kleinberg 22-26 Jan 2007 1 Introduction In this course we will be looking

More information

A survey of point-based POMDP solvers

A survey of point-based POMDP solvers DOI 10.1007/s10458-012-9200-2 A survey of point-based POMDP solvers Guy Shani Joelle Pineau Robert Kaplow The Author(s) 2012 Abstract The past decade has seen a significant breakthrough in research on

More information

Solutions to Homework 6

Solutions to Homework 6 Solutions to Homework 6 Debasish Das EECS Department, Northwestern University ddas@northwestern.edu 1 Problem 5.24 We want to find light spanning trees with certain special properties. Given is one example

More information

Markov Decision Processes for Ad Network Optimization

Markov Decision Processes for Ad Network Optimization Markov Decision Processes for Ad Network Optimization Flávio Sales Truzzi 1, Valdinei Freire da Silva 2, Anna Helena Reali Costa 1, Fabio Gagliardi Cozman 3 1 Laboratório de Técnicas Inteligentes (LTI)

More information

Revenue Management for Transportation Problems

Revenue Management for Transportation Problems Revenue Management for Transportation Problems Francesca Guerriero Giovanna Miglionico Filomena Olivito Department of Electronic Informatics and Systems, University of Calabria Via P. Bucci, 87036 Rende

More information

Dynamic Programming. Lecture 11. 11.1 Overview. 11.2 Introduction

Dynamic Programming. Lecture 11. 11.1 Overview. 11.2 Introduction Lecture 11 Dynamic Programming 11.1 Overview Dynamic Programming is a powerful technique that allows one to solve many different types of problems in time O(n 2 ) or O(n 3 ) for which a naive approach

More information

Binary Image Reconstruction

Binary Image Reconstruction A network flow algorithm for reconstructing binary images from discrete X-rays Kees Joost Batenburg Leiden University and CWI, The Netherlands kbatenbu@math.leidenuniv.nl Abstract We present a new algorithm

More information

POMPDs Make Better Hackers: Accounting for Uncertainty in Penetration Testing. By: Chris Abbott

POMPDs Make Better Hackers: Accounting for Uncertainty in Penetration Testing. By: Chris Abbott POMPDs Make Better Hackers: Accounting for Uncertainty in Penetration Testing By: Chris Abbott Introduction What is penetration testing? Methodology for assessing network security, by generating and executing

More information

CMSC 858T: Randomized Algorithms Spring 2003 Handout 8: The Local Lemma

CMSC 858T: Randomized Algorithms Spring 2003 Handout 8: The Local Lemma CMSC 858T: Randomized Algorithms Spring 2003 Handout 8: The Local Lemma Please Note: The references at the end are given for extra reading if you are interested in exploring these ideas further. You are

More information

Warshall s Algorithm: Transitive Closure

Warshall s Algorithm: Transitive Closure CS 0 Theory of Algorithms / CS 68 Algorithms in Bioinformaticsi Dynamic Programming Part II. Warshall s Algorithm: Transitive Closure Computes the transitive closure of a relation (Alternatively: all paths

More information

Lecture 3. Linear Programming. 3B1B Optimization Michaelmas 2015 A. Zisserman. Extreme solutions. Simplex method. Interior point method

Lecture 3. Linear Programming. 3B1B Optimization Michaelmas 2015 A. Zisserman. Extreme solutions. Simplex method. Interior point method Lecture 3 3B1B Optimization Michaelmas 2015 A. Zisserman Linear Programming Extreme solutions Simplex method Interior point method Integer programming and relaxation The Optimization Tree Linear Programming

More information

Dynamic programming formulation

Dynamic programming formulation 1.24 Lecture 14 Dynamic programming: Job scheduling Dynamic programming formulation To formulate a problem as a dynamic program: Sort by a criterion that will allow infeasible combinations to be eli minated

More information

INTEGER PROGRAMMING. Integer Programming. Prototype example. BIP model. BIP models

INTEGER PROGRAMMING. Integer Programming. Prototype example. BIP model. BIP models Integer Programming INTEGER PROGRAMMING In many problems the decision variables must have integer values. Example: assign people, machines, and vehicles to activities in integer quantities. If this is

More information

CSC2420 Spring 2015: Lecture 3

CSC2420 Spring 2015: Lecture 3 CSC2420 Spring 2015: Lecture 3 Allan Borodin January 22, 2015 1 / 1 Announcements and todays agenda Assignment 1 due next Thursday. I may add one or two additional questions today or tomorrow. Todays agenda

More information

Mathematical Programming Approach to Dynamic Resource Allocation Problems Ph.D. Thesis Proposal

Mathematical Programming Approach to Dynamic Resource Allocation Problems Ph.D. Thesis Proposal UNIVERSIDAD CARLOS III DE MADRID DEPARTMENT OF STATISTICS Mathematical Programming Approach to Dynamic Resource Allocation Problems Ph.D. Thesis Proposal Peter Jacko November 18, 2005 UNIVERSIDAD CARLOS

More information

Skill-based Routing Policies in Call Centers

Skill-based Routing Policies in Call Centers BMI Paper Skill-based Routing Policies in Call Centers by Martijn Onderwater January 2010 BMI Paper Skill-based Routing Policies in Call Centers Author: Martijn Onderwater Supervisor: Drs. Dennis Roubos

More information

Module1. x 1000. y 800.

Module1. x 1000. y 800. Module1 1 Welcome to the first module of the course. It is indeed an exciting event to share with you the subject that has lot to offer both from theoretical side and practical aspects. To begin with,

More information

Approximation Algorithms for Stochastic Inventory Control Models

Approximation Algorithms for Stochastic Inventory Control Models Approximation Algorithms for Stochastic Inventory Control Models (Abstract) Retsef Levi Martin Pál Robin O. Roundy David B. Shmoys School of ORIE, Cornell University, Ithaca, NY 14853, USA DIMACS Center,

More information

Near Optimal Solutions

Near Optimal Solutions Near Optimal Solutions Many important optimization problems are lacking efficient solutions. NP-Complete problems unlikely to have polynomial time solutions. Good heuristics important for such problems.

More information

Algorithm Design and Analysis

Algorithm Design and Analysis Algorithm Design and Analysis LECTURE 27 Approximation Algorithms Load Balancing Weighted Vertex Cover Reminder: Fill out SRTEs online Don t forget to click submit Sofya Raskhodnikova 12/6/2011 S. Raskhodnikova;

More information

Mathematical finance and linear programming (optimization)

Mathematical finance and linear programming (optimization) Mathematical finance and linear programming (optimization) Geir Dahl September 15, 2009 1 Introduction The purpose of this short note is to explain how linear programming (LP) (=linear optimization) may

More information

ALGORITHMIC TRADING WITH MARKOV CHAINS

ALGORITHMIC TRADING WITH MARKOV CHAINS June 16, 2010 ALGORITHMIC TRADING WITH MARKOV CHAINS HENRIK HULT AND JONAS KIESSLING Abstract. An order book consists of a list of all buy and sell offers, represented by price and quantity, available

More information

The Union-Find Problem Kruskal s algorithm for finding an MST presented us with a problem in data-structure design. As we looked at each edge,

The Union-Find Problem Kruskal s algorithm for finding an MST presented us with a problem in data-structure design. As we looked at each edge, The Union-Find Problem Kruskal s algorithm for finding an MST presented us with a problem in data-structure design. As we looked at each edge, cheapest first, we had to determine whether its two endpoints

More information

Tagging with Hidden Markov Models

Tagging with Hidden Markov Models Tagging with Hidden Markov Models Michael Collins 1 Tagging Problems In many NLP problems, we would like to model pairs of sequences. Part-of-speech (POS) tagging is perhaps the earliest, and most famous,

More information

A Branch and Bound Algorithm for Solving the Binary Bi-level Linear Programming Problem

A Branch and Bound Algorithm for Solving the Binary Bi-level Linear Programming Problem A Branch and Bound Algorithm for Solving the Binary Bi-level Linear Programming Problem John Karlof and Peter Hocking Mathematics and Statistics Department University of North Carolina Wilmington Wilmington,

More information

Load Balancing and Switch Scheduling

Load Balancing and Switch Scheduling EE384Y Project Final Report Load Balancing and Switch Scheduling Xiangheng Liu Department of Electrical Engineering Stanford University, Stanford CA 94305 Email: liuxh@systems.stanford.edu Abstract Load

More information

How To Optimize A Multi-Echelon Inventory System

How To Optimize A Multi-Echelon Inventory System The University of Melbourne, Department of Mathematics and Statistics Decomposition Approach to Serial Inventory Systems under Uncertainties Jennifer Kusuma Honours Thesis, 2005 Supervisors: Prof. Peter

More information

Link-State Routing Can Achieve Optimal Traffic Engineering: From Entropy To IP

Link-State Routing Can Achieve Optimal Traffic Engineering: From Entropy To IP Link-State Routing Can Achieve Optimal Traffic Engineering: From Entropy To IP Dahai Xu, Ph.D. Florham Park AT&T Labs - Research Joint work with Mung Chiang and Jennifer Rexford (Princeton University)

More information

ML for the Working Programmer

ML for the Working Programmer ML for the Working Programmer 2nd edition Lawrence C. Paulson University of Cambridge CAMBRIDGE UNIVERSITY PRESS CONTENTS Preface to the Second Edition Preface xiii xv 1 Standard ML 1 Functional Programming

More information

IEOR 6711: Stochastic Models, I Fall 2012, Professor Whitt, Final Exam SOLUTIONS

IEOR 6711: Stochastic Models, I Fall 2012, Professor Whitt, Final Exam SOLUTIONS IEOR 6711: Stochastic Models, I Fall 2012, Professor Whitt, Final Exam SOLUTIONS There are four questions, each with several parts. 1. Customers Coming to an Automatic Teller Machine (ATM) (30 points)

More information

24. The Branch and Bound Method

24. The Branch and Bound Method 24. The Branch and Bound Method It has serious practical consequences if it is known that a combinatorial problem is NP-complete. Then one can conclude according to the present state of science that no

More information

Fairness in Routing and Load Balancing

Fairness in Routing and Load Balancing Fairness in Routing and Load Balancing Jon Kleinberg Yuval Rabani Éva Tardos Abstract We consider the issue of network routing subject to explicit fairness conditions. The optimization of fairness criteria

More information

Continued Fractions and the Euclidean Algorithm

Continued Fractions and the Euclidean Algorithm Continued Fractions and the Euclidean Algorithm Lecture notes prepared for MATH 326, Spring 997 Department of Mathematics and Statistics University at Albany William F Hammond Table of Contents Introduction

More information

Learning the Structure of Factored Markov Decision Processes in Reinforcement Learning Problems

Learning the Structure of Factored Markov Decision Processes in Reinforcement Learning Problems Learning the Structure of Factored Markov Decision Processes in Reinforcement Learning Problems Thomas Degris Thomas.Degris@lip6.fr Olivier Sigaud Olivier.Sigaud@lip6.fr Pierre-Henri Wuillemin Pierre-Henri.Wuillemin@lip6.fr

More information

Numerical Methods I Solving Linear Systems: Sparse Matrices, Iterative Methods and Non-Square Systems

Numerical Methods I Solving Linear Systems: Sparse Matrices, Iterative Methods and Non-Square Systems Numerical Methods I Solving Linear Systems: Sparse Matrices, Iterative Methods and Non-Square Systems Aleksandar Donev Courant Institute, NYU 1 donev@courant.nyu.edu 1 Course G63.2010.001 / G22.2420-001,

More information

Numerical methods for American options

Numerical methods for American options Lecture 9 Numerical methods for American options Lecture Notes by Andrzej Palczewski Computational Finance p. 1 American options The holder of an American option has the right to exercise it at any moment

More information

4.2 Description of the Event operation Network (EON)

4.2 Description of the Event operation Network (EON) Integrated support system for planning and scheduling... 2003/4/24 page 39 #65 Chapter 4 The EON model 4. Overview The present thesis is focused in the development of a generic scheduling framework applicable

More information

An Application of Inverse Reinforcement Learning to Medical Records of Diabetes Treatment

An Application of Inverse Reinforcement Learning to Medical Records of Diabetes Treatment An Application of Inverse Reinforcement Learning to Medical Records of Diabetes Treatment Hideki Asoh 1, Masanori Shiro 1 Shotaro Akaho 1, Toshihiro Kamishima 1, Koiti Hasida 1, Eiji Aramaki 2, and Takahide

More information

Inductive QoS Packet Scheduling for Adaptive Dynamic Networks

Inductive QoS Packet Scheduling for Adaptive Dynamic Networks Inductive QoS Packet Scheduling for Adaptive Dynamic Networks Malika BOURENANE Dept of Computer Science University of Es-Senia Algeria mb_regina@yahoo.fr Abdelhamid MELLOUK LISSI Laboratory University

More information

Scheduling Single Machine Scheduling. Tim Nieberg

Scheduling Single Machine Scheduling. Tim Nieberg Scheduling Single Machine Scheduling Tim Nieberg Single machine models Observation: for non-preemptive problems and regular objectives, a sequence in which the jobs are processed is sufficient to describe

More information

Introduction to Support Vector Machines. Colin Campbell, Bristol University

Introduction to Support Vector Machines. Colin Campbell, Bristol University Introduction to Support Vector Machines Colin Campbell, Bristol University 1 Outline of talk. Part 1. An Introduction to SVMs 1.1. SVMs for binary classification. 1.2. Soft margins and multi-class classification.

More information

Efficient Prediction of Value and PolicyPolygraphic Success

Efficient Prediction of Value and PolicyPolygraphic Success Journal of Machine Learning Research 14 (2013) 1181-1227 Submitted 10/11; Revised 10/12; Published 4/13 Performance Bounds for λ Policy Iteration and Application to the Game of Tetris Bruno Scherrer MAIA

More information

Linear Programming Notes V Problem Transformations

Linear Programming Notes V Problem Transformations Linear Programming Notes V Problem Transformations 1 Introduction Any linear programming problem can be rewritten in either of two standard forms. In the first form, the objective is to maximize, the material

More information

Overview. Essential Questions. Precalculus, Quarter 4, Unit 4.5 Build Arithmetic and Geometric Sequences and Series

Overview. Essential Questions. Precalculus, Quarter 4, Unit 4.5 Build Arithmetic and Geometric Sequences and Series Sequences and Series Overview Number of instruction days: 4 6 (1 day = 53 minutes) Content to Be Learned Write arithmetic and geometric sequences both recursively and with an explicit formula, use them

More information

Scheduling Real-time Tasks: Algorithms and Complexity

Scheduling Real-time Tasks: Algorithms and Complexity Scheduling Real-time Tasks: Algorithms and Complexity Sanjoy Baruah The University of North Carolina at Chapel Hill Email: baruah@cs.unc.edu Joël Goossens Université Libre de Bruxelles Email: joel.goossens@ulb.ac.be

More information

Branch-and-Price Approach to the Vehicle Routing Problem with Time Windows

Branch-and-Price Approach to the Vehicle Routing Problem with Time Windows TECHNISCHE UNIVERSITEIT EINDHOVEN Branch-and-Price Approach to the Vehicle Routing Problem with Time Windows Lloyd A. Fasting May 2014 Supervisors: dr. M. Firat dr.ir. M.A.A. Boon J. van Twist MSc. Contents

More information

Lower bounds for Howard s algorithm for finding Minimum Mean-Cost Cycles

Lower bounds for Howard s algorithm for finding Minimum Mean-Cost Cycles Lower bounds for Howard s algorithm for finding Minimum Mean-Cost Cycles Thomas Dueholm Hansen 1 and Uri Zwick 2 1 Department of Computer Science, Aarhus University. 2 School of Computer Science, Tel Aviv

More information

Spatial Decomposition/Coordination Methods for Stochastic Optimal Control Problems. Practical aspects and theoretical questions

Spatial Decomposition/Coordination Methods for Stochastic Optimal Control Problems. Practical aspects and theoretical questions Spatial Decomposition/Coordination Methods for Stochastic Optimal Control Problems Practical aspects and theoretical questions P. Carpentier, J-Ph. Chancelier, M. De Lara, V. Leclère École des Ponts ParisTech

More information

Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay

Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay Lecture - 17 Shannon-Fano-Elias Coding and Introduction to Arithmetic Coding

More information

Scheduling Shop Scheduling. Tim Nieberg

Scheduling Shop Scheduling. Tim Nieberg Scheduling Shop Scheduling Tim Nieberg Shop models: General Introduction Remark: Consider non preemptive problems with regular objectives Notation Shop Problems: m machines, n jobs 1,..., n operations

More information

Lecture 3: Growth with Overlapping Generations (Acemoglu 2009, Chapter 9, adapted from Zilibotti)

Lecture 3: Growth with Overlapping Generations (Acemoglu 2009, Chapter 9, adapted from Zilibotti) Lecture 3: Growth with Overlapping Generations (Acemoglu 2009, Chapter 9, adapted from Zilibotti) Kjetil Storesletten September 10, 2013 Kjetil Storesletten () Lecture 3 September 10, 2013 1 / 44 Growth

More information

Lecture 1: Course overview, circuits, and formulas

Lecture 1: Course overview, circuits, and formulas Lecture 1: Course overview, circuits, and formulas Topics in Complexity Theory and Pseudorandomness (Spring 2013) Rutgers University Swastik Kopparty Scribes: John Kim, Ben Lund 1 Course Information Swastik

More information

CSE 135: Introduction to Theory of Computation Decidability and Recognizability

CSE 135: Introduction to Theory of Computation Decidability and Recognizability CSE 135: Introduction to Theory of Computation Decidability and Recognizability Sungjin Im University of California, Merced 04-28, 30-2014 High-Level Descriptions of Computation Instead of giving a Turing

More information

Florida Math for College Readiness

Florida Math for College Readiness Core Florida Math for College Readiness Florida Math for College Readiness provides a fourth-year math curriculum focused on developing the mastery of skills identified as critical to postsecondary readiness

More information

1 Solving LPs: The Simplex Algorithm of George Dantzig

1 Solving LPs: The Simplex Algorithm of George Dantzig Solving LPs: The Simplex Algorithm of George Dantzig. Simplex Pivoting: Dictionary Format We illustrate a general solution procedure, called the simplex algorithm, by implementing it on a very simple example.

More information

Complexity Theory. IE 661: Scheduling Theory Fall 2003 Satyaki Ghosh Dastidar

Complexity Theory. IE 661: Scheduling Theory Fall 2003 Satyaki Ghosh Dastidar Complexity Theory IE 661: Scheduling Theory Fall 2003 Satyaki Ghosh Dastidar Outline Goals Computation of Problems Concepts and Definitions Complexity Classes and Problems Polynomial Time Reductions Examples

More information

Supplement to Call Centers with Delay Information: Models and Insights

Supplement to Call Centers with Delay Information: Models and Insights Supplement to Call Centers with Delay Information: Models and Insights Oualid Jouini 1 Zeynep Akşin 2 Yves Dallery 1 1 Laboratoire Genie Industriel, Ecole Centrale Paris, Grande Voie des Vignes, 92290

More information

SchedulAir. Airline planning & airline scheduling with Unified Optimization. decisal. Copyright 2014 Decisal Ltd. All rights reserved.

SchedulAir. Airline planning & airline scheduling with Unified Optimization. decisal. Copyright 2014 Decisal Ltd. All rights reserved. Copyright 2014 Decisal Ltd. All rights reserved. Airline planning & airline scheduling with Unified Optimization SchedulAir Overview Unified Optimization Benders decomposition Airline planning & scheduling

More information

Fast Algorithms for Monotonic Discounted Linear Programs with Two Variables per Inequality

Fast Algorithms for Monotonic Discounted Linear Programs with Two Variables per Inequality Fast Algorithms for Monotonic Discounted Linear Programs with Two Variables per Inequality Daniel Andersson and Sergei Vorobyov Uppsala University, Sweden May 19, 2006 Abstract We suggest new strongly

More information