# An Introduction to Markov Decision Processes. MDP Tutorial - 1

Save this PDF as:

Size: px
Start display at page:

## Transcription

1 An Introduction to Markov Decision Processes Bob Givan Purdue University Ron Parr Duke University MDP Tutorial - 1

2 Outline Markov Decision Processes defined (Bob) Objective functions Policies Finding Optimal Solutions (Ron) Dynamic programming Linear programming Refinements to the basic model (Bob) Partial observability Factored representations MDP Tutorial - 2

3 Stochastic Automata with Utilities A Markov Decision Process (MDP) model contains: A set of possible world states S A set of possible actions A A real valued reward function R(s,a) A description T of each action s effects in each state. We assume the Markov Property: the effects of an action taken in a state depend only on that state and not on the prior history. MDP Tutorial - 3

4 Stochastic Automata with Utilities A Markov Decision Process (MDP) model contains: A set of possible world states S A set of possible actions A A real valued reward function R(s,a) A description T of each action s effects in each state. We assume the Markov Property: the effects of an action taken in a state depend only on that state and not on the prior history. MDP Tutorial - 4

5 Representing Actions Deterministic Actions: T : S A S For each state and action we specify a new state. Stochastic Actions: T : S A Prob( S) For each state and action we specify a probability distribution over next states. Represents the distribution P(s s, a) MDP Tutorial - 5

6 Representing Actions Deterministic Actions: T : S A S For each state and action we specify a new state. Stochastic Actions: T : S A Prob( S) For each state and action we specify a probability distribution over next states. Represents the distribution P(s s, a). 1.0 MDP Tutorial - 6

7 Representing Solutions A policy π is a mapping from S to A MDP Tutorial - 7

8 Following a Policy Following a policy π: 1. Determine the current state s 2. Execute action π(s) 3. Goto step 1. Assumes full observability: the new state resulting from executing an action will be known to the system MDP Tutorial - 8

9 Evaluating a Policy How good is a policy π in a state s? For deterministic actions just total the rewards obtained... but result may be infinite. For stochastic actions, instead expected total reward obtained again typically yields infinite value. How do we compare policies of infinite value? MDP Tutorial - 9

10 Objective Functions An objective function maps infinite sequences of rewards to single real numbers (representing utility) Options: 1. Set a finite horizon and just total the reward 2. Discounting to prefer earlier rewards 3. Average reward rate in the limit Discounting is perhaps the most analytically tractable and most widely studied approach MDP Tutorial - 10

11 Discounting A reward n steps away is discounted by for discount rate 0 < γ < 1. models mortality: you may die at any moment models preference for shorter solutions a smoothed out version of limited horizon lookahead γ n We use cumulative discounted reward as our objective (Max value <= M γ M γ M +... = ) 1 γ M MDP Tutorial - 11

12 Value Functions A value function V π : S R represents the expected objective value obtained following policy π from each state in S. Value functions partially order the policies, but at least one optimal policy exists, and all optimal policies have the same value function, V * MDP Tutorial - 12

13 Bellman Equations Bellman equations relate the value function to itself via the problem dynamics. For the discounted objective function, V π ( s) = R( s, π( s) ) + T ( s, π( s), s ) γ V π ( s ) s S V * ( s) = MAX a A R sa, ( ) + T ( sas,, ) γ V * ( s ) s S In each case, there is one equation per state in S MDP Tutorial - 13

14 Finite-horizon Bellman Equations Finite-horizon values at adjacent horizons are related by the action dynamics V π, 0 ( s) = R( s, π( s) ) V π, n ( s) = R sa, ( ) + T ( sas,, ) γ V π, n 1 ( s ) s S MDP Tutorial - 14

15 Relation to Model Checking Some thoughts on the relationship MDP solution focuses critically on expected value Contrast safety properties which focus on worst case This contrast allows MDP methods to exploit sampling and approximation more aggressively MDP Tutorial - 15

16 At this point, Ron Parr spoke on solution methods for about 1/2 an hour, and then I continued. MDP Tutorial - 16

17 Large State Spaces In AI problems, the state space is typically astronomically large described implicitly, not enumerated decomposed into factors, or aspects of state Issues raised: How can we represent reward and action behaviors in such MDPs? How can we find solutions in such MDPs? MDP Tutorial - 17

18 A Factored MDP Representation State Space S assignments to state variables: On-Mars?, Need-Power?, Daytime?,..etc... Partitions each block a DNF formula (or BDD, etc) Block 1: not On-Mars? Block 2: On-Mars? and Need-Power? Block 3: On-Mars? and not Need-Power? Reward function R labelled state-space partition: Block 1: not On-Mars? Reward=0 Block 2: On-Mars? and Need-Power? Reward=4 Block 3: On-Mars? and not Need-Power?.. Reward=5 MDP Tutorial - 18

19 Factored Representations of Actions Assume: actions affect state variables independently. 1 e.g...pr(nd-power? ^ On-Mars? x, a) = Pr (Nd-Power? x, a) * Pr (On-Mars? x, a) Represent effect on each state variable as labelled partition: Effects of Action Charge-Battery on variable Need-Power? Pr(Need-Power? Block n) Block 1: not On-Mars? Block 2: On-Mars? and Need-Power? Block 3: On-Mars? and not Need-Power? This assumption can be relaxed. MDP Tutorial - 19

20 Representing Blocks Identifying irrelevant state variables Decision trees DNF formulas Binary/Algebraic Decision Diagrams MDP Tutorial - 20

21 Partial Observability System state can not always be determined a Partially Observable MDP (POMDP) Action outcomes are not fully observable Add a set of observations O to the model Add an observation distribution U(s,o) for each state Add an initial state distribution I Key notion: belief state, a distribution over system states representing where I think I am MDP Tutorial - 21

22 POMDP to MDP Conversion Belief state Pr(x) can be updated to Pr(x o) using Bayes rule: Pr(s s,o) = Pr(o s,s ) Pr(s s) / Pr(o s) = U(s,o) T(s,a,s) normalized Pr(s o) = Pr(s s,o) Pr(s) A POMDP is Markovian and fully observable relative to the belief state. a POMDP can be treated as a continuous state MDP MDP Tutorial - 22

23 Belief State Approximation Problem: When MDP state space is astronomical, belief states cannot be explicitly represented. Consequence: MDP conversion of POMDP impractical Solution: Represent belief state approximately Typically exploiting factored state representation Typically exploiting (near) conditional independence properties of the belief state factors MDP Tutorial - 23

### Preliminaries: Problem Definition Agent model, POMDP, Bayesian RL

POMDP Tutorial Preliminaries: Problem Definition Agent model, POMDP, Bayesian RL Observation Belief ACTOR Transition Dynamics WORLD b Policy π Action Markov Decision Process -X: set of states [x s,x r

### Informatics 2D Reasoning and Agents Semester 2, 2015-16

Informatics 2D Reasoning and Agents Semester 2, 2015-16 Alex Lascarides alex@inf.ed.ac.uk Lecture 30 Markov Decision Processes 25th March 2016 Informatics UoE Informatics 2D 1 Sequential decision problems

### Planning in Artificial Intelligence

Logical Representations and Computational Methods for Marov Decision Processes Craig Boutilier Department of Computer Science University of Toronto Planning in Artificial Intelligence Planning has a long

### Solving Hybrid Markov Decision Processes

Solving Hybrid Markov Decision Processes Alberto Reyes 1, L. Enrique Sucar +, Eduardo F. Morales + and Pablo H. Ibargüengoytia Instituto de Investigaciones Eléctricas Av. Reforma 113, Palmira Cuernavaca,

### Online Planning Algorithms for POMDPs

Journal of Artificial Intelligence Research 32 (2008) 663-704 Submitted 03/08; published 07/08 Online Planning Algorithms for POMDPs Stéphane Ross Joelle Pineau School of Computer Science McGill University,

### Modeling Treatment of Ischemic Heart Disease with Partially Observable Markov Decision Processes.

Modeling Treatment of Ischemic Heart Disease with Partially Observable Markov Decision Processes. Milos Hauskrecht,HamishFraser 2;3 Computer Science Department, Box 90, Brown University, Providence, RI

### Lecture 1: Introduction to Reinforcement Learning

Lecture 1: Introduction to Reinforcement Learning David Silver Outline 1 Admin 2 About Reinforcement Learning 3 The Reinforcement Learning Problem 4 Inside An RL Agent 5 Problems within Reinforcement Learning

### Computing Near Optimal Strategies for Stochastic Investment Planning Problems

Computing Near Optimal Strategies for Stochastic Investment Planning Problems Milos Hauskrecfat 1, Gopal Pandurangan 1,2 and Eli Upfal 1,2 Computer Science Department, Box 1910 Brown University Providence,

### Reinforcement Learning

Reinforcement Learning LU 2 - Markov Decision Problems and Dynamic Programming Dr. Martin Lauer AG Maschinelles Lernen und Natürlichsprachliche Systeme Albert-Ludwigs-Universität Freiburg martin.lauer@kit.edu

### Reinforcement Learning

Reinforcement Learning LU 2 - Markov Decision Problems and Dynamic Programming Dr. Joschka Bödecker AG Maschinelles Lernen und Natürlichsprachliche Systeme Albert-Ludwigs-Universität Freiburg jboedeck@informatik.uni-freiburg.de

### OPTIMIZATION AND OPERATIONS RESEARCH Vol. IV - Markov Decision Processes - Ulrich Rieder

MARKOV DECISIO PROCESSES Ulrich Rieder University of Ulm, Germany Keywords: Markov decision problem, stochastic dynamic program, total reward criteria, average reward, optimal policy, optimality equation,

### BINOMIAL OPTIONS PRICING MODEL. Mark Ioffe. Abstract

BINOMIAL OPTIONS PRICING MODEL Mark Ioffe Abstract Binomial option pricing model is a widespread numerical method of calculating price of American options. In terms of applied mathematics this is simple

### Chi-square Tests Driven Method for Learning the Structure of Factored MDPs

Chi-square Tests Driven Method for Learning the Structure of Factored MDPs Thomas Degris Thomas.Degris@lip6.fr Olivier Sigaud Olivier.Sigaud@lip6.fr Pierre-Henri Wuillemin Pierre-Henri.Wuillemin@lip6.fr

### strategic white paper

strategic white paper AUTOMATED PLANNING FOR REMOTE PENETRATION TESTING Lloyd Greenwald and Robert Shanley LGS Innovations / Bell Labs Florham Park, NJ US In this work we consider the problem of automatically

### Stochastic Dynamic Programming with Factored Representations

Stochastic Dynamic Programming with Factored Representations Craig Boutiliery Department of Computer Science Richard Dearden Department of Computer Science University of Toronto Toronto, ON, M5S 3H5, CANADA

### An Application of Inverse Reinforcement Learning to Medical Records of Diabetes Treatment

An Application of Inverse Reinforcement Learning to Medical Records of Diabetes Treatment Hideki Asoh 1, Masanori Shiro 1 Shotaro Akaho 1, Toshihiro Kamishima 1, Koiti Hasida 1, Eiji Aramaki 2, and Takahide

### Stochastic Models for Inventory Management at Service Facilities

Stochastic Models for Inventory Management at Service Facilities O. Berman, E. Kim Presented by F. Zoghalchi University of Toronto Rotman School of Management Dec, 2012 Agenda 1 Problem description Deterministic

### Planning Treatment of Ischemic Heart Disease with Partially Observable Markov Decision Processes

Artificial Intelligence in Medicine, vol 18 (2000), pp. 221 244 Submitted 5/99; Accepted 09/99 Planning Treatment of Ischemic Heart Disease with Partially Observable Markov Decision Processes Milos Hauskrecht

### Valuation of the Minimum Guaranteed Return Embedded in Life Insurance Products

Financial Institutions Center Valuation of the Minimum Guaranteed Return Embedded in Life Insurance Products by Knut K. Aase Svein-Arne Persson 96-20 THE WHARTON FINANCIAL INSTITUTIONS CENTER The Wharton

### Best-response play in partially observable card games

Best-response play in partially observable card games Frans Oliehoek Matthijs T. J. Spaan Nikos Vlassis Informatics Institute, Faculty of Science, University of Amsterdam, Kruislaan 43, 98 SJ Amsterdam,

### A survey of point-based POMDP solvers

DOI 10.1007/s10458-012-9200-2 A survey of point-based POMDP solvers Guy Shani Joelle Pineau Robert Kaplow The Author(s) 2012 Abstract The past decade has seen a significant breakthrough in research on

### AN ALGEBRAIC APPROACH TO ABSTRACTION IN REINFORCEMENT LEARNING

AN ALGEBRAIC APPROACH TO ABSTRACTION IN REINFORCEMENT LEARNING A Dissertation Presented by BALARAMAN RAVINDRAN Submitted to the Graduate School of the University of Massachusetts Amherst in partial fulfillment

### Optimal Design of Sequential Real-Time Communication Systems Aditya Mahajan, Member, IEEE, and Demosthenis Teneketzis, Fellow, IEEE

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 55, NO. 11, NOVEMBER 2009 5317 Optimal Design of Sequential Real-Time Communication Systems Aditya Mahajan, Member, IEEE, Demosthenis Teneketzis, Fellow, IEEE

### Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning

Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning Richard S. Sutton a, Doina Precup b, and Satinder Singh a a AT&T Labs Research, 180 Park Avenue, Florham Park,

### Final Exam, Spring 2007

10-701 Final Exam, Spring 2007 1. Personal info: Name: Andrew account: E-mail address: 2. There should be 16 numbered pages in this exam (including this cover sheet). 3. You can use any material you brought:

### Lecture Note 1 Set and Probability Theory. MIT 14.30 Spring 2006 Herman Bennett

Lecture Note 1 Set and Probability Theory MIT 14.30 Spring 2006 Herman Bennett 1 Set Theory 1.1 Definitions and Theorems 1. Experiment: any action or process whose outcome is subject to uncertainty. 2.

### POMPDs Make Better Hackers: Accounting for Uncertainty in Penetration Testing. By: Chris Abbott

POMPDs Make Better Hackers: Accounting for Uncertainty in Penetration Testing By: Chris Abbott Introduction What is penetration testing? Methodology for assessing network security, by generating and executing

### On Adversarial Policy Switching with Experiments in Real-time Strategy Games

On Adversarial Policy Switching with Experiments in Real-time Strategy Games Brian King 1 and Alan Fern 2 and Jesse Hostetler 2 Department of Electrical Engineering and Computer Science Oregon State University

CS 252 - Markov Chains Additional Reading 1 and Homework problems 1 Markov Chains The first model we discussed, the DFA, is deterministic it s transition function must be total and it allows for transition

### Probabilistic Model Checking at Runtime for the Provisioning of Cloud Resources

Probabilistic Model Checking at Runtime for the Provisioning of Cloud Resources Athanasios Naskos, Emmanouela Stachtiari, Panagiotis Katsaros, and Anastasios Gounaris Aristotle University of Thessaloniki,

### Optimization under uncertainty: modeling and solution methods

Optimization under uncertainty: modeling and solution methods Paolo Brandimarte Dipartimento di Scienze Matematiche Politecnico di Torino e-mail: paolo.brandimarte@polito.it URL: http://staff.polito.it/paolo.brandimarte

### Lecture 6: Option Pricing Using a One-step Binomial Tree. Friday, September 14, 12

Lecture 6: Option Pricing Using a One-step Binomial Tree An over-simplified model with surprisingly general extensions a single time step from 0 to T two types of traded securities: stock S and a bond

### Markov Decision Processes for Ad Network Optimization

Markov Decision Processes for Ad Network Optimization Flávio Sales Truzzi 1, Valdinei Freire da Silva 2, Anna Helena Reali Costa 1, Fabio Gagliardi Cozman 3 1 Laboratório de Técnicas Inteligentes (LTI)

### TD(0) Leads to Better Policies than Approximate Value Iteration

TD(0) Leads to Better Policies than Approximate Value Iteration Benjamin Van Roy Management Science and Engineering and Electrical Engineering Stanford University Stanford, CA 94305 bvr@stanford.edu Abstract

### Learning the Structure of Factored Markov Decision Processes in Reinforcement Learning Problems

Learning the Structure of Factored Markov Decision Processes in Reinforcement Learning Problems Thomas Degris Thomas.Degris@lip6.fr Olivier Sigaud Olivier.Sigaud@lip6.fr Pierre-Henri Wuillemin Pierre-Henri.Wuillemin@lip6.fr

### Q1. [19 pts] Cheating at cs188-blackjack

CS 188 Spring 2012 Introduction to Artificial Intelligence Practice Midterm II Solutions Q1. [19 pts] Cheating at cs188-blackjack Cheating dealers have become a serious problem at the cs188-blackjack tables.

### Safe robot motion planning in dynamic, uncertain environments

Safe robot motion planning in dynamic, uncertain environments RSS 2011 Workshop: Guaranteeing Motion Safety for Robots June 27, 2011 Noel du Toit and Joel Burdick California Institute of Technology Dynamic,

### Development of dynamically evolving and self-adaptive software. 1. Background

Development of dynamically evolving and self-adaptive software 1. Background LASER 2013 Isola d Elba, September 2013 Carlo Ghezzi Politecnico di Milano Deep-SE Group @ DEIB 1 Requirements Functional requirements

### Scalable Planning and Learning for Multiagent POMDPs

Scalable Planning and Learning for Multiagent POMDPs Christopher Amato CSAIL MIT camato@csail.mit.edu Frans A. Oliehoek Informatics Institute, University of Amsterdam DKE, Maastricht University f.a.oliehoek@uva.nl

### Motivation. Motivation. Can a software agent learn to play Backgammon by itself? Machine Learning. Reinforcement Learning

Motivation Machine Learning Can a software agent learn to play Backgammon by itself? Reinforcement Learning Prof. Dr. Martin Riedmiller AG Maschinelles Lernen und Natürlichsprachliche Systeme Institut

### A control Lyapunov function approach for the computation of the infinite-horizon stochastic reach-avoid problem

A control Lyapunov function approach for the computation of the infinite-horizon stochastic reach-avoid problem Ilya Tkachev and Alessandro Abate Abstract This work is devoted to the solution of the stochastic

### Using Markov Decision Processes to Solve a Portfolio Allocation Problem

Using Markov Decision Processes to Solve a Portfolio Allocation Problem Daniel Bookstaber April 26, 2005 Contents 1 Introduction 3 2 Defining the Model 4 2.1 The Stochastic Model for a Single Asset.........................

### Predictive Representations of State

Predictive Representations of State Michael L. Littman Richard S. Sutton AT&T Labs Research, Florham Park, New Jersey {sutton,mlittman}@research.att.com Satinder Singh Syntek Capital, New York, New York

### Mathematical Programming Approach to Dynamic Resource Allocation Problems Ph.D. Thesis Proposal

UNIVERSIDAD CARLOS III DE MADRID DEPARTMENT OF STATISTICS Mathematical Programming Approach to Dynamic Resource Allocation Problems Ph.D. Thesis Proposal Peter Jacko November 18, 2005 UNIVERSIDAD CARLOS

### Today s Agenda. Automata and Logic. Quiz 4 Temporal Logic. Introduction Buchi Automata Linear Time Logic Summary

Today s Agenda Quiz 4 Temporal Logic Formal Methods in Software Engineering 1 Automata and Logic Introduction Buchi Automata Linear Time Logic Summary Formal Methods in Software Engineering 2 1 Buchi Automata

### MODELING CUSTOMER RELATIONSHIPS AS MARKOV CHAINS. Journal of Interactive Marketing, 14(2), Spring 2000, 43-55

MODELING CUSTOMER RELATIONSHIPS AS MARKOV CHAINS Phillip E. Pfeifer and Robert L. Carraway Darden School of Business 100 Darden Boulevard Charlottesville, VA 22903 Journal of Interactive Marketing, 14(2),

### DiPro - A Tool for Probabilistic Counterexample Generation

DiPro - A Tool for Probabilistic Counterexample Generation Husain Aljazzar, Florian Leitner-Fischer, Stefan Leue, and Dimitar Simeonov University of Konstanz, Germany Abstract. The computation of counterexamples

### Techniques for Supporting Prediction of Security Breaches in. Critical Cloud Infrastructures Using Bayesian Network and. Markov Decision Process

Techniques for Supporting Prediction of Security Breaches in Critical Cloud Infrastructures Using Bayesian Network and Markov Decision Process by Vinjith Nagaraja A Thesis Presentation in Partial Fulfillment

### Chapter 11 Monte Carlo Simulation

Chapter 11 Monte Carlo Simulation 11.1 Introduction The basic idea of simulation is to build an experimental device, or simulator, that will act like (simulate) the system of interest in certain important

### Formal decision analysis has been increasingly

TUTORIAL Markov Decision Processes: A Tool for Sequential Decision Making under Uncertainty Oguzhan Alagoz, PhD, Heather Hsu, MS, Andrew J. Schaefer, PhD, Mark S. Roberts, MD, MPP We provide a tutorial

### Policy Iteration for Factored MDPs

326 UNCERTAINTY IN ARTIFICIAL INTELLIGENCE PROCEEDINGS 2000 Policy Iteration for Factored MDPs Daphne Koller Computer Science Department Stanford University Stanford, CA 94305-9010 koller@cs. stanford.

### Tagging with Hidden Markov Models

Tagging with Hidden Markov Models Michael Collins 1 Tagging Problems In many NLP problems, we would like to model pairs of sequences. Part-of-speech (POS) tagging is perhaps the earliest, and most famous,

### Human Behavior Modeling with Maximum Entropy Inverse Optimal Control

Human Behavior Modeling with Maximum Entropy Inverse Optimal Control Brian D. Ziebart, Andrew Maas, J.Andrew Bagnell, and Anind K. Dey School of Computer Science Carnegie Mellon University Pittsburgh,

### Hidden Markov Models for biological systems

Hidden Markov Models for biological systems 1 1 1 1 0 2 2 2 2 N N KN N b o 1 o 2 o 3 o T SS 2005 Heermann - Universität Heidelberg Seite 1 We would like to identify stretches of sequences that are actually

### Artificially Intelligent Adaptive Tutoring System

Artificially Intelligent Adaptive Tutoring System William Whiteley Stanford School of Engineering Stanford, CA 94305 bill.whiteley@gmail.com Abstract This paper presents an adaptive tutoring system based

### ADVANTAGE UPDATING. Leemon C. Baird III. Technical Report WL-TR Wright Laboratory. Wright-Patterson Air Force Base, OH

ADVANTAGE UPDATING Leemon C. Baird III Technical Report WL-TR-93-1146 Wright Laboratory Wright-Patterson Air Force Base, OH 45433-7301 Address: WL/AAAT, Bldg 635 2185 Avionics Circle Wright-Patterson Air

### Economics 702 Macroeconomic Theory Practice Examination #1 October 2008

Economics 702 Macroeconomic Theory Practice Examination #1 October 2008 Instructions: There are a total of 60 points on this examination and you will have 60 minutes to complete it. The rst 10 minutes

### Linear functions are used to describe quantities that grow (or decline) at a constant rate. But not all functions are linear, or even nearly so.

Section 4.1 1 Exponential growth and decay Linear functions are used to describe quantities that grow (or decline) at a constant rate. But not all functions are linear, or even nearly so. There are commonly

### Markov Decision Processes and the Modelling of Patient Flows

Markov Decision Processes and the Modelling of Patient Flows Anthony Clissold Supervised by Professor Jerzy Filar Flinders University 1 Introduction Australia s growing and ageing population and a rise

### 1 Representation of Games. Kerschbamer: Commitment and Information in Games

1 epresentation of Games Kerschbamer: Commitment and Information in Games Game-Theoretic Description of Interactive Decision Situations This lecture deals with the process of translating an informal description

### Chapter 5 Discrete Probability Distribution. Learning objectives

Chapter 5 Discrete Probability Distribution Slide 1 Learning objectives 1. Understand random variables and probability distributions. 1.1. Distinguish discrete and continuous random variables. 2. Able

### Tetris: Experiments with the LP Approach to Approximate DP

Tetris: Experiments with the LP Approach to Approximate DP Vivek F. Farias Electrical Engineering Stanford University Stanford, CA 94403 vivekf@stanford.edu Benjamin Van Roy Management Science and Engineering

### Nonparametric adaptive age replacement with a one-cycle criterion

Nonparametric adaptive age replacement with a one-cycle criterion P. Coolen-Schrijner, F.P.A. Coolen Department of Mathematical Sciences University of Durham, Durham, DH1 3LE, UK e-mail: Pauline.Schrijner@durham.ac.uk

### 15 Markov Chains: Limiting Probabilities

MARKOV CHAINS: LIMITING PROBABILITIES 67 Markov Chains: Limiting Probabilities Example Assume that the transition matrix is given by 7 2 P = 6 Recall that the n-step transition probabilities are given

### DATA MINING IN FINANCE

DATA MINING IN FINANCE Advances in Relational and Hybrid Methods by BORIS KOVALERCHUK Central Washington University, USA and EVGENII VITYAEV Institute of Mathematics Russian Academy of Sciences, Russia

### In Optimum Design 2000, A. Atkinson, B. Bogacka, and A. Zhigljavsky, eds., Kluwer, 2001, pp. 195 208.

In Optimum Design 2000, A. Atkinson, B. Bogacka, and A. Zhigljavsky, eds., Kluwer, 2001, pp. 195 208. ÔØ Ö ½ ÇÈÌÁÅÁ ÁÆ ÍÆÁÅÇ Ä Ê ËÈÇÆË ÍÆ ÌÁÇÆ ÇÊ ÁÆ Ê Î ÊÁ Ä Ë Â Ò À Ö Û ÈÙÖ Ù ÍÒ Ú Ö ØÝ Ï Ø Ä Ý ØØ ÁÆ ¼

### Discrete Mathematics and Probability Theory Fall 2009 Satish Rao, David Tse Note 13. Random Variables: Distribution and Expectation

CS 70 Discrete Mathematics and Probability Theory Fall 2009 Satish Rao, David Tse Note 3 Random Variables: Distribution and Expectation Random Variables Question: The homeworks of 20 students are collected

### Reinforcement Learning of Task Plans for Real Robot Systems

Reinforcement Learning of Task Plans for Real Robot Systems Pedro Tomás Mendes Resende pedro.resende@ist.utl.pt Instituto Superior Técnico, Lisboa, Portugal October 2014 Abstract This paper is the extended

### Planning with Continuous Resources in Stochastic Domains

Planning with Continuous Resources in Stochastic Domains Mausam Dept. of Computer Science and Engineering University of Washington Seattle, WA 981952350 mausam@cs.washington.edu Abstract We consider the

### MODELING CUSTOMER RELATIONSHIPS AS MARKOV CHAINS

AS MARKOV CHAINS Phillip E. Pfeifer Robert L. Carraway f PHILLIP E. PFEIFER AND ROBERT L. CARRAWAY are with the Darden School of Business, Charlottesville, Virginia. INTRODUCTION The lifetime value of

### CPSC 322 Midterm Practice Questions

CPSC 322 Midterm Practice Questions The midterm will contain some questions similar to the ones in the list below. Some of these questions may actually come verbatim from the list below. Unless otherwise

### Improving Data Driven Part-of-Speech Tagging by Morphologic Knowledge Induction

Improving Data Driven Part-of-Speech Tagging by Morphologic Knowledge Induction Uwe D. Reichel Department of Phonetics and Speech Communication University of Munich reichelu@phonetik.uni-muenchen.de Abstract

### On the relationship between MDPs and the BDI architecture

On the relationship between MDPs and the BDI architecture Gerardo I. Simari Department of Computer Science University of Maryland College Park College Park, MD 20742 gisimari@cs.umd.edu Simon Parsons Dept

### Chapter 1. Introduction. 1.1 Motivation and Outline

Chapter 1 Introduction 1.1 Motivation and Outline In a global competitive market, companies are always trying to improve their profitability. A tool which has proven successful in order to achieve this

### 24 Uses of Turing Machines

Formal Language and Automata Theory: CS2004 24 Uses of Turing Machines 24 Introduction We have previously covered the application of Turing Machine as a recognizer and decider In this lecture we will discuss

### A Decomposition Approach for a Capacitated, Single Stage, Production-Inventory System

A Decomposition Approach for a Capacitated, Single Stage, Production-Inventory System Ganesh Janakiraman 1 IOMS-OM Group Stern School of Business New York University 44 W. 4th Street, Room 8-160 New York,

### RRE: A Game-Theoretic Intrusion Response and Recovery Engine

RRE: A Game-Theoretic Intrusion Response and Recovery Engine Saman A. Zonouz, Himanshu Khurana, William H. Sanders, and Timothy M. Yardley University of Illinois at Urbana-Champaign {saliari2, hkhurana,

Transportation Research Part A 36 (2002) 763 778 www.elsevier.com/locate/tra Optimal maintenance and repair policies in infrastructure management under uncertain facility deterioration rates: an adaptive

### The MultiAgent Decision Process toolbox: software for decision-theoretic planning in multiagent systems

The MultiAgent Decision Process toolbox: software for decision-theoretic planning in multiagent systems Matthijs T.J. Spaan Institute for Systems and Robotics Instituto Superior Técnico Lisbon, Portugal

### Some Research Directions in Automated Pentesting

Carlos Sarraute Research Directions in Automated Pentesting 1/50 Some Research Directions in Automated Pentesting Carlos Sarraute CoreLabs & ITBA PhD program Buenos Aires, Argentina H2HC October 29/30,

### Using Markov Decision Theory to Provide a Fair Challenge in a Roll-and-Move Board Game

Using Markov Decision Theory to Provide a Fair Challenge in a Roll-and-Move Board Game Éric Beaudry, Francis Bisson, Simon Chamberland, Froduald Kabanza Abstract Board games are often taken as examples

### Ratchet Equity Indexed Annuities

Ratchet Equity Indexed Annuities Mary R Hardy University of Waterloo Waterloo Ontario N2L 3G1 Canada Tel: 1-519-888-4567 Fax: 1-519-746-1875 email: mrhardy@uwaterloo.ca Abstract The Equity Indexed Annuity

### Performance Bounds for λ Policy Iteration and Application to the Game of Tetris

Journal of Machine Learning Research 14 (2013) 1181-1227 Submitted 10/11; Revised 10/12; Published 4/13 Performance Bounds for λ Policy Iteration and Application to the Game of Tetris Bruno Scherrer MAIA

### Monte Carlo Simulation

Chapter 4 Monte Carlo Simulation Due to the complexity of the problem, treating irregular spin systems with competing interactions in thermal equilibrium and nonequilibrium, we are forced to use Monte

### SEQUENCES ARITHMETIC SEQUENCES. Examples

SEQUENCES ARITHMETIC SEQUENCES An ordered list of numbers such as: 4, 9, 6, 25, 36 is a sequence. Each number in the sequence is a term. Usually variables with subscripts are used to label terms. For example,

### November 28 th, Carlos Guestrin. Lower dimensional projections

PCA Machine Learning 070/578 Carlos Guestrin Carnegie Mellon University November 28 th, 2007 Lower dimensional projections Rather than picking a subset of the features, we can new features that are combinations

### Penetration Testing == POMDP Solving?

Penetration Testing == POMDP Solving? Carlos Sarraute Core Security Technologies & ITBA Buenos Aires, Argentina carlos@coresecurity.com Olivier Buffet and Jörg Hoffmann INRIA Nancy, France {olivier.buffet,joerg.hoffmann}@loria.fr

### Chapter 4. SDP - difficulties. 4.1 Curse of dimensionality

Chapter 4 SDP - difficulties So far we have discussed simple problems. Not necessarily in a conceptual framework, but surely in a computational. It makes little sense to discuss SDP, or DP for that matter,

### ALMOST COMMON PRIORS 1. INTRODUCTION

ALMOST COMMON PRIORS ZIV HELLMAN ABSTRACT. What happens when priors are not common? We introduce a measure for how far a type space is from having a common prior, which we term prior distance. If a type

### Inductive QoS Packet Scheduling for Adaptive Dynamic Networks

Inductive QoS Packet Scheduling for Adaptive Dynamic Networks Malika BOURENANE Dept of Computer Science University of Es-Senia Algeria mb_regina@yahoo.fr Abdelhamid MELLOUK LISSI Laboratory University

### Decomposition Approach to Serial Inventory Systems under Uncertainties

The University of Melbourne, Department of Mathematics and Statistics Decomposition Approach to Serial Inventory Systems under Uncertainties Jennifer Kusuma Honours Thesis, 2005 Supervisors: Prof. Peter

### Linear Equations in Linear Algebra

1 Linear Equations in Linear Algebra 1.2 Row Reduction and Echelon Forms ECHELON FORM A rectangular matrix is in echelon form (or row echelon form) if it has the following three properties: 1. All nonzero

### Equilibrium computation: Part 1

Equilibrium computation: Part 1 Nicola Gatti 1 Troels Bjerre Sorensen 2 1 Politecnico di Milano, Italy 2 Duke University, USA Nicola Gatti and Troels Bjerre Sørensen ( Politecnico di Milano, Italy, Equilibrium

### Lecture 12: The Black-Scholes Model Steven Skiena. http://www.cs.sunysb.edu/ skiena

Lecture 12: The Black-Scholes Model Steven Skiena Department of Computer Science State University of New York Stony Brook, NY 11794 4400 http://www.cs.sunysb.edu/ skiena The Black-Scholes-Merton Model

### Summary of Probability

Summary of Probability Mathematical Physics I Rules of Probability The probability of an event is called P(A), which is a positive number less than or equal to 1. The total probability for all possible

### Approximate Solutions to Factored Markov Decision Processes via Greedy Search in the Space of Finite State Controllers

pproximate Solutions to Factored Markov Decision Processes via Greedy Search in the Space of Finite State Controllers Kee-Eung Kim, Thomas L. Dean Department of Computer Science Brown University Providence,

### WEEK #22: PDFs and CDFs, Measures of Center and Spread

WEEK #22: PDFs and CDFs, Measures of Center and Spread Goals: Explore the effect of independent events in probability calculations. Present a number of ways to represent probability distributions. Textbook