Model-Free vs. Model- Based RL: Q, SARSA, & E 3

Size: px
Start display at page:

Download "Model-Free vs. Model- Based RL: Q, SARSA, & E 3"

Transcription

1 Model-Free vs. Model- Based RL: Q, SARSA, & E 3

2 Administrivia Reminder: Office hours tomorrow truncated 9:00-10:15 AM Can schedule other times if necessary Final projects Final presentations Dec 2, 7, 9 20 min (max) presentations 3 or 4 per day Sign up for presentation slots today!

3 The Q-learning algorithm Algorithm: Q_learn Inputs: State space S; Act. space A Discount γ (0<=γ<1); Learning rate α (0<=α<1) Outputs: Q Repeat { s=get_current_world_state() a=pick_next_action(q,s) (r,s )=act_in_world(a) Q(s,a)=Q(s,a)+α*(r+γ*max_a (Q(s,a ))-Q(s,a)) } Until (bored)

4 SARSA-learning algorithm Algorithm: SARSA_learn Inputs: State space S; Act. space A Discount γ (0<=γ<1); Learning rate α (0<=α<1) Outputs: Q s=get_current_world_state() a=pick_next_action(q,s) Repeat { (r,s )=act_in_world(a) a =pick_next_action(q,s ) Q(s,a)=Q(s,a)+α*(r+γ*Q(s,a )-Q(s,a)) a=a ; s=s ; } Until (bored)

5 SARSA vs. Q SARSA and Q-learning very similar SARSA updates Q(s,a) for the policy it s actually executing Lets the pick_next_action() function pick action to update Q updates Q(s,a) for greedy policy w.r.t. current Q Uses max_a to pick action to update might be diff than the action it executes at s In practice: Q will learn the true π*, but SARSA will learn about what it s actually doing Exploration can get Q-learning in trouble...

6 Radioactive breadcrumbs Can now define eligibility traces for SARSA In addition to Q(s,a) table, keep an e(s,a) table Records eligibility (real number) for each state/ action pair At every step ((s,a,r,s,a ) tuple): Increment e(s,a) for current (s,a) pair by 1 Update all Q(s,a ) vals in proportion to their e(s,a ) Decay all e(s,a ) by factor of λγ Leslie Kaelbling calls this the radioactive breadcrumbs form of RL

7 SARSA(λ)-learning alg. Algorithm: SARSA(λ)_learn Inputs: S, A, γ (0<=γ<1), α (0<=α<1), λ (0<=λ<1) Outputs: Q e(s,a)=0 // for all s, a s=get_curr_world_st(); a=pick_nxt_act(q,s); Repeat { (r,s )=act_in_world(a) a =pick_next_action(q,s ) δ=r+γ*q(s,a )-Q(s,a) e(s,a)+=1 foreach (s,a ) pair in (SXA) { Q(s,a )=Q(s,a )+α*e(s,a )*δ e(s,a )*=λγ } a=a ; s=s ; } Until (bored)

8 The trail of crumbs Path taken Sutton & Barto, Sec 7.5

9 The trail of crumbs Action values increased by one-step Sarsa λ=0 Sutton & Barto, Sec 7.5

10 The trail of crumbs Action values increased by Sarsa(!) with!=0.9 Sutton & Barto, Sec 7.5

11 Eligibility for a single state e(s i,a j ) accumulating eligibility trace times of visits to a state 1st visit 2nd visit... Sutton & Barto, Sec 7.5

12 Eligibility trace followup Eligibility trace allows: Tracking where the agent has been Backup of rewards over longer periods Credit assignment: state/action pairs rewarded for having contributed to getting to the reward Why does it work?

13 The forward view of elig. Original SARSA did one step backup: Q(s, a) (1 α)q(s, a) + α 1 Q(s, a) 1 Q(s, a) (r t + γq(s t+1, a t+1 )) Info backup Rest of trajectory Q(s,a) r t Q(s t+1,a t+1 )

14 The forward view of elig. Original SARSA did one step backup: Q(s, a) (1 α)q(s, a) + α 1 Q(s, a) 1 Q(s, a) (r t + γq(s t+1, a t+1 )) Could also do a two step backup : Q(s, a) (1 α)q(s, a) + α 2 Q(s, a) 2 Q(s, a) (r t + γr t+1 + γ 2 Q(s t+2, a t+2 )) Info backup Rest of trajectory Q(s,a) r t+1 r t Q(s t+2,a t+2 )

15 The forward view of elig. Original SARSA did one step backup: Q(s, a) (1 α)q(s, a) + α 1 Q(s, a) 1 Q(s, a) (r t + γq(s t+1, a t+1 )) Could also do a two step backup : Q(s, a) (1 α)q(s, a) + α 2 Q(s, a) 2 Q(s, a) (r t + γr t+1 + γ 2 Q(s t+2, a t+2 )) Or even an n step backup : Q(s, a) ((1 α)q(s, a) + α n Q(s, a) n 1 n Q(s, a) i=0 γ i r t+i ) + γ n Q(s t+n, a t+n ))

16 The forward view of elig. Small-step backups (n=1, n=2, etc.) are slow and nearsighted Large-step backups (n=100, n=1000, n= ) are expensive and may miss near-term effects Want a way to combine them Can take a weighted average of different backups E.g.: Q(s, a) (1 α)q(s, a)+ α ( 1 3 2Q(s, a) Q(s, a) )

17 The forward view of elig. Q(s, a) (1 α)q(s, a)+ α ( 1 3 2Q(s, a) Q(s, a) ) 1/3 2/3

18 The forward view of elig. How do you know which number of steps to avg over? And what the weights should be? Accumulating eligibility traces are just a clever way to easily avg. over all n: Q(s, a) (1 α)q(s, a)+ ( ) α (1 λ) λ i 1 i Q(s, a) i=1

19 The forward view of elig. λ 0 λ 1 λ 2 ( α (1 λ) i=1 λ i 1 i Q(s, a) ) λ n-1

20 Replacing traces Kind just described are accumulating e-traces Every time you go back to state, add extra e. There are also replacing eligibility traces Every time you go back to a state/action, reset e(s,a) to 1 Works better sometimes times of state visits accumulating trace replacing trace Sutton & Barto, Sec 7.8

21 Model-free vs. Model-based

22 What do you know? Both Q-learning and SARSA(λ) are model free methods A.k.a., value-based methods Learn a Q function Never learn T or R explicitly At the end of learning, agent knows how to act, but doesn t explicitly know anything about the environment Also, no guarantees about explore/exploit tradeoff Sometimes, want one or both of the above

23 Model-based methods Model based methods, OTOH, do explicitly learn T & R At end of learning, have entire M= S,A,T,R Also have π* At least one model-based method also guarantees explore/exploit tradeoff properties

24 E 3 Efficient Explore & Exploit algorithm Kearns & Singh, Machine Learning 49, 2002 Explicitly keeps a T matrix and a R table Plan (policy iter) w/ curr. T & R -> curr. π Every state/action entry in T and R: Can be marked known or unknown Has a #visits counter, nv(s,a) After every s,a,r,s tuple, update T & R (running average) When nv(s,a)>nvthresh, mark cell as known & re-plan When all states known, done learning & have π*

25 The E 3 algorithm Algorithm: E3_learn_sketch // only an overview Inputs: S, A, γ (0<=γ<1), NVthresh, R max, Var max Outputs: T, R, π* Initialization: R(s)=R max // for all s T(s,a,s )=1/ S // for all s,a,s known(s,a)=0; nv(s,a)=0; // for all s, a π=policy_iter(s,a,t,r)

26 The E 3 algorithm Algorithm: E3_learn_sketch // con t Repeat { s=get_current_world_state() a=π(s) (r,s )=act_in_world(a) T(s,a,s )=(1+T(s,a,s )*nv(s,a))/(nv(s,a)+1) nv(s,a)++; if (nv(s,a)>nvthresh) { known(s,a)=1; π=policy_iter(s,a,t,r) } } Until (all (s,a) known)

Eligibility Traces. Suggested reading: Contents: Chapter 7 in R. S. Sutton, A. G. Barto: Reinforcement Learning: An Introduction MIT Press, 1998.

Eligibility Traces. Suggested reading: Contents: Chapter 7 in R. S. Sutton, A. G. Barto: Reinforcement Learning: An Introduction MIT Press, 1998. Eligibility Traces 0 Eligibility Traces Suggested reading: Chapter 7 in R. S. Sutton, A. G. Barto: Reinforcement Learning: An Introduction MIT Press, 1998. Eligibility Traces Eligibility Traces 1 Contents:

More information

A Sarsa based Autonomous Stock Trading Agent

A Sarsa based Autonomous Stock Trading Agent A Sarsa based Autonomous Stock Trading Agent Achal Augustine The University of Texas at Austin Department of Computer Science Austin, TX 78712 USA achal@cs.utexas.edu Abstract This paper describes an autonomous

More information

Lecture 5: Model-Free Control

Lecture 5: Model-Free Control Lecture 5: Model-Free Control David Silver Outline 1 Introduction 2 On-Policy Monte-Carlo Control 3 On-Policy Temporal-Difference Learning 4 Off-Policy Learning 5 Summary Introduction Model-Free Reinforcement

More information

Preliminaries: Problem Definition Agent model, POMDP, Bayesian RL

Preliminaries: Problem Definition Agent model, POMDP, Bayesian RL POMDP Tutorial Preliminaries: Problem Definition Agent model, POMDP, Bayesian RL Observation Belief ACTOR Transition Dynamics WORLD b Policy π Action Markov Decision Process -X: set of states [x s,x r

More information

Motivation. Motivation. Can a software agent learn to play Backgammon by itself? Machine Learning. Reinforcement Learning

Motivation. Motivation. Can a software agent learn to play Backgammon by itself? Machine Learning. Reinforcement Learning Motivation Machine Learning Can a software agent learn to play Backgammon by itself? Reinforcement Learning Prof. Dr. Martin Riedmiller AG Maschinelles Lernen und Natürlichsprachliche Systeme Institut

More information

Using Markov Decision Processes to Solve a Portfolio Allocation Problem

Using Markov Decision Processes to Solve a Portfolio Allocation Problem Using Markov Decision Processes to Solve a Portfolio Allocation Problem Daniel Bookstaber April 26, 2005 Contents 1 Introduction 3 2 Defining the Model 4 2.1 The Stochastic Model for a Single Asset.........................

More information

A Multi-agent Q-learning Framework for Optimizing Stock Trading Systems

A Multi-agent Q-learning Framework for Optimizing Stock Trading Systems A Multi-agent Q-learning Framework for Optimizing Stock Trading Systems Jae Won Lee 1 and Jangmin O 2 1 School of Computer Science and Engineering, Sungshin Women s University, Seoul, Korea 136-742 jwlee@cs.sungshin.ac.kr

More information

Lecture 1: Introduction to Reinforcement Learning

Lecture 1: Introduction to Reinforcement Learning Lecture 1: Introduction to Reinforcement Learning David Silver Outline 1 Admin 2 About Reinforcement Learning 3 The Reinforcement Learning Problem 4 Inside An RL Agent 5 Problems within Reinforcement Learning

More information

Ź Ź ł ź Ź ś ź ł ź Ś ę ż ż ł ż ż Ż Ś ę Ż Ż ę ś ź ł Ź ł ł ż ż ź ż ż Ś ę ż ż Ź Ł Ż Ż Ą ż ż ę ź Ń Ź ś ł ź ż ł ś ź ź Ą ć ś ś Ź Ś ę ę ć ż Ź Ą Ń Ą ł ć ć ł ł ź ę Ś ę ś ę ł ś ć ź ś ł ś ł ł ł ł ć ć Ś ł ź Ś ł

More information

Learning the Structure of Factored Markov Decision Processes in Reinforcement Learning Problems

Learning the Structure of Factored Markov Decision Processes in Reinforcement Learning Problems Learning the Structure of Factored Markov Decision Processes in Reinforcement Learning Problems Thomas Degris Thomas.Degris@lip6.fr Olivier Sigaud Olivier.Sigaud@lip6.fr Pierre-Henri Wuillemin Pierre-Henri.Wuillemin@lip6.fr

More information

NEURAL NETWORKS AND REINFORCEMENT LEARNING. Abhijit Gosavi

NEURAL NETWORKS AND REINFORCEMENT LEARNING. Abhijit Gosavi NEURAL NETWORKS AND REINFORCEMENT LEARNING Abhijit Gosavi Department of Engineering Management and Systems Engineering Missouri University of Science and Technology Rolla, MO 65409 1 Outline A Quick Introduction

More information

Reinforcement Learning with Factored States and Actions

Reinforcement Learning with Factored States and Actions Journal of Machine Learning Research 5 (2004) 1063 1088 Submitted 3/02; Revised 1/04; Published 8/04 Reinforcement Learning with Factored States and Actions Brian Sallans Austrian Research Institute for

More information

Inductive QoS Packet Scheduling for Adaptive Dynamic Networks

Inductive QoS Packet Scheduling for Adaptive Dynamic Networks Inductive QoS Packet Scheduling for Adaptive Dynamic Networks Malika BOURENANE Dept of Computer Science University of Es-Senia Algeria mb_regina@yahoo.fr Abdelhamid MELLOUK LISSI Laboratory University

More information

Chi-square Tests Driven Method for Learning the Structure of Factored MDPs

Chi-square Tests Driven Method for Learning the Structure of Factored MDPs Chi-square Tests Driven Method for Learning the Structure of Factored MDPs Thomas Degris Thomas.Degris@lip6.fr Olivier Sigaud Olivier.Sigaud@lip6.fr Pierre-Henri Wuillemin Pierre-Henri.Wuillemin@lip6.fr

More information

Model-based reinforcement learning with nearly tight exploration complexity bounds

Model-based reinforcement learning with nearly tight exploration complexity bounds Model-based reinforcement learning with nearly tight exploration complexity bounds István Szita Csaba Szepesvári University of Alberta, Athabasca Hall, Edmonton, AB T6G 2E8 Canada szityu@gmail.com szepesva@cs.ualberta.ca

More information

Goal-Directed Online Learning of Predictive Models

Goal-Directed Online Learning of Predictive Models Goal-Directed Online Learning of Predictive Models Sylvie C.W. Ong, Yuri Grinberg, and Joelle Pineau McGill University, Montreal, Canada Abstract. We present an algorithmic approach for integrated learning

More information

UFSJ2D (UaiSoccer2D + RoboCap): Team Description Paper Robocup 2014

UFSJ2D (UaiSoccer2D + RoboCap): Team Description Paper Robocup 2014 UFSJ2D (UaiSoccer2D + RoboCap): Team Description Paper Robocup 2014 Andre Luiz C. Ottoni, Heitor Magno R. Junior, Itallo G. Machado, Lara T. Cordeiro, Erivelton G. Nepomuceno, Eduardo B. Pereira, Rone

More information

Where Do Rewards Come From?

Where Do Rewards Come From? Where Do Rewards Come From? Satinder Singh baveja@umich.edu Computer Science & Engineering University of Michigan, Ann Arbor Richard L. Lewis rickl@umich.edu Department of Psychology University of Michigan,

More information

Urban Traffic Control Based on Learning Agents

Urban Traffic Control Based on Learning Agents Urban Traffic Control Based on Learning Agents Pierre-Luc Grégoire, Charles Desjardins, Julien Laumônier and Brahim Chaib-draa DAMAS Laboratory, Computer Science and Software Engineering Department, Laval

More information

Goal-Directed Online Learning of Predictive Models

Goal-Directed Online Learning of Predictive Models Goal-Directed Online Learning of Predictive Models Sylvie C.W. Ong, Yuri Grinberg, and Joelle Pineau School of Computer Science McGill University, Montreal, Canada Abstract. We present an algorithmic approach

More information

Reinforcement Learning for Resource Allocation and Time Series Tools for Mobility Prediction

Reinforcement Learning for Resource Allocation and Time Series Tools for Mobility Prediction Reinforcement Learning for Resource Allocation and Time Series Tools for Mobility Prediction Baptiste Lefebvre 1,2, Stephane Senecal 2 and Jean-Marc Kelif 2 1 École Normale Supérieure (ENS), Paris, France,

More information

Reinforcement Learning of Task Plans for Real Robot Systems

Reinforcement Learning of Task Plans for Real Robot Systems Reinforcement Learning of Task Plans for Real Robot Systems Pedro Tomás Mendes Resende pedro.resende@ist.utl.pt Instituto Superior Técnico, Lisboa, Portugal October 2014 Abstract This paper is the extended

More information

Introduction. Premise and Hypothesis. Methodology

Introduction. Premise and Hypothesis. Methodology Sustained Teacher/Parent Communication, and Student Achievement Kevin O. Cox 6 th Grade Social Studies (US History) Teacher Gunston Middle School Arlington County (VA) Public Schools Submitted June 2002

More information

Assignment #1. Example: tetris state: board configuration + shape of the falling piece ~2 200 states! Recap RL so far. Page 1

Assignment #1. Example: tetris state: board configuration + shape of the falling piece ~2 200 states! Recap RL so far. Page 1 Generalization and function approximation CS 287: Advanced Robotics Fall 2009 Lecture 14: Reinforcement Learning with Function Approximation and TD Gammon case study Pieter Abbeel UC Berkeley EECS Represent

More information

Reducing Operational Costs in Cloud Social TV: An Opportunity for Cloud Cloning

Reducing Operational Costs in Cloud Social TV: An Opportunity for Cloud Cloning IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 16, NO. 6, OCTOBER 2014 1739 Reducing Operational Costs in Cloud Social TV: An Opportunity for Cloud Cloning Yichao Jin, Yonggang Wen, Senior Member, IEEE, Han Hu,

More information

Reinforcement Learning

Reinforcement Learning Reinforcement Learning LU 2 - Markov Decision Problems and Dynamic Programming Dr. Martin Lauer AG Maschinelles Lernen und Natürlichsprachliche Systeme Albert-Ludwigs-Universität Freiburg martin.lauer@kit.edu

More information

Figure 1: Cost and Speed of Access of different storage components. Page 30

Figure 1: Cost and Speed of Access of different storage components. Page 30 Reinforcement Learning Approach for Data Migration in Hierarchical Storage Systems T.G. Lakshmi, R.R. Sedamkar, Harshali Patil Department of Computer Engineering, Thakur College of Engineering and Technology,

More information

Online Learning and Exploiting Relational Models in Reinforcement Learning

Online Learning and Exploiting Relational Models in Reinforcement Learning Online Learning and Exploiting Relational Models in Reinforcement Learning Tom Croonenborghs, Jan Ramon, Hendrik Blockeel and Maurice Bruyoghe K.U.Leuven, Dept. of Computer Science Celestijnenlaan 200A,

More information

Distributed Reinforcement Learning for Network Intrusion Response

Distributed Reinforcement Learning for Network Intrusion Response Distributed Reinforcement Learning for Network Intrusion Response KLEANTHIS MALIALIS Doctor of Philosophy UNIVERSITY OF YORK COMPUTER SCIENCE September 2014 For my late grandmother Iroulla Abstract The

More information

Neural Fitted Q Iteration - First Experiences with a Data Efficient Neural Reinforcement Learning Method

Neural Fitted Q Iteration - First Experiences with a Data Efficient Neural Reinforcement Learning Method Neural Fitted Q Iteration - First Experiences with a Data Efficient Neural Reinforcement Learning Method Martin Riedmiller Neuroinformatics Group, University of Onsabrück, 49078 Osnabrück Abstract. This

More information

(Refer Slide Time: 00:01:16 min)

(Refer Slide Time: 00:01:16 min) Digital Computer Organization Prof. P. K. Biswas Department of Electronic & Electrical Communication Engineering Indian Institute of Technology, Kharagpur Lecture No. # 04 CPU Design: Tirning & Control

More information

Reinforcement Learning

Reinforcement Learning Reinforcement Learning LU 2 - Markov Decision Problems and Dynamic Programming Dr. Joschka Bödecker AG Maschinelles Lernen und Natürlichsprachliche Systeme Albert-Ludwigs-Universität Freiburg jboedeck@informatik.uni-freiburg.de

More information

1 SRS Advanced Customer Loyalty

1 SRS Advanced Customer Loyalty 1 SRS Advanced Customer Loyalty SRS Release 2 has the capability to execute offers for targeted customers and / or items with a rules-based logic (quantity and value brackets, item combinations etc.) and

More information

A Reinforcement Learning Framework For Dynamic Power Management of a Portable, Multi-Camera Traffic Monitoring System

A Reinforcement Learning Framework For Dynamic Power Management of a Portable, Multi-Camera Traffic Monitoring System A Reinforcement Learning Framework For Dynamic Power Management of a Portable, Multi-Camera Traffic Monitoring System Umair Ali Khan, Bernhard Rinner Institute of Networked and Embedded Systems Alpen-Adria

More information

An Early Attempt at Applying Deep Reinforcement Learning to the Game 2048

An Early Attempt at Applying Deep Reinforcement Learning to the Game 2048 An Early Attempt at Applying Deep Reinforcement Learning to the Game 2048 Hong Gui, Tinghan Wei, Ching-Bo Huang, I-Chen Wu 1 1 Department of Computer Science, National Chiao Tung University, Hsinchu, Taiwan

More information

Loyalty Codes. Customers: Getting Started. Overview. Introduction to loyalty programs

Loyalty Codes. Customers: Getting Started. Overview. Introduction to loyalty programs 1 Customers: Getting Started Loyalty Codes Overview Customer loyalty ( frequent buyer ) programs allow you to reward customers for their business and encourage customers to purchase more frequently in

More information

Algorithms for Reinforcement Learning

Algorithms for Reinforcement Learning Algorithms for Reinforcement Learning Draft of the lecture published in the Synthesis Lectures on Artificial Intelligence and Machine Learning series by Morgan & Claypool Publishers Csaba Szepesvári June

More information

SYSM 6304: Risk and Decision Analysis Lecture 5: Methods of Risk Analysis

SYSM 6304: Risk and Decision Analysis Lecture 5: Methods of Risk Analysis SYSM 6304: Risk and Decision Analysis Lecture 5: Methods of Risk Analysis M. Vidyasagar Cecil & Ida Green Chair The University of Texas at Dallas Email: M.Vidyasagar@utdallas.edu October 17, 2015 Outline

More information

MapReduce. MapReduce and SQL Injections. CS 3200 Final Lecture. Introduction. MapReduce. Programming Model. Example

MapReduce. MapReduce and SQL Injections. CS 3200 Final Lecture. Introduction. MapReduce. Programming Model. Example MapReduce MapReduce and SQL Injections CS 3200 Final Lecture Jeffrey Dean and Sanjay Ghemawat. MapReduce: Simplified Data Processing on Large Clusters. OSDI'04: Sixth Symposium on Operating System Design

More information

How To Maximize Admission Control On A Network

How To Maximize Admission Control On A Network Optimizing admission control while ensuring quality of service in multimedia networks via reinforcement learning* Timothy X Brown t, Hui Tong t, Satinder Singh+ t Electrical and Computer Engineering +

More information

Dynamic Programming. Lecture 11. 11.1 Overview. 11.2 Introduction

Dynamic Programming. Lecture 11. 11.1 Overview. 11.2 Introduction Lecture 11 Dynamic Programming 11.1 Overview Dynamic Programming is a powerful technique that allows one to solve many different types of problems in time O(n 2 ) or O(n 3 ) for which a naive approach

More information

VLIW Processors. VLIW Processors

VLIW Processors. VLIW Processors 1 VLIW Processors VLIW ( very long instruction word ) processors instructions are scheduled by the compiler a fixed number of operations are formatted as one big instruction (called a bundle) usually LIW

More information

IN AVIATION it regularly occurs that an airplane encounters. Guaranteed globally optimal continuous reinforcement learning.

IN AVIATION it regularly occurs that an airplane encounters. Guaranteed globally optimal continuous reinforcement learning. GUARANTEED GLOBALLY OPTIMAL CONTINUOUS REINFORCEMENT LEARNING 1 Guaranteed globally optimal continuous reinforcement learning Hildo Bijl Abstract Self-learning and adaptable autopilots have the potential

More information

TD(0) Leads to Better Policies than Approximate Value Iteration

TD(0) Leads to Better Policies than Approximate Value Iteration TD(0) Leads to Better Policies than Approximate Value Iteration Benjamin Van Roy Management Science and Engineering and Electrical Engineering Stanford University Stanford, CA 94305 bvr@stanford.edu Abstract

More information

14.1 Rent-or-buy problem

14.1 Rent-or-buy problem CS787: Advanced Algorithms Lecture 14: Online algorithms We now shift focus to a different kind of algorithmic problem where we need to perform some optimization without knowing the input in advance. Algorithms

More information

Demonstrating a DATA Step with and without a RETAIN Statement

Demonstrating a DATA Step with and without a RETAIN Statement 1 The RETAIN Statement Introduction 1 Demonstrating a DATA Step with and without a RETAIN Statement 1 Generating Sequential SUBJECT Numbers Using a Retained Variable 7 Using a SUM Statement to Create SUBJECT

More information

Beyond Reward: The Problem of Knowledge and Data

Beyond Reward: The Problem of Knowledge and Data Beyond Reward: The Problem of Knowledge and Data Richard S. Sutton University of Alberta Edmonton, Alberta, Canada Intelligence can be defined, informally, as knowing a lot and being able to use that knowledge

More information

MAXIMUS Telephone Enrollment- Phase I Call Center Script

MAXIMUS Telephone Enrollment- Phase I Call Center Script Hello, is (First Name) there? (Full Name)? Hi, Mr./Ms. (Last Name). This is (Representative Name) calling from California Health Care Options. I m calling about your Medi-Cal benefits. Did you get the

More information

Diversity and Degrees of Freedom in Wireless Communications

Diversity and Degrees of Freedom in Wireless Communications 1 Diversity and Degrees of Freedom in Wireless Communications Mahesh Godavarti Altra Broadband Inc., godavarti@altrabroadband.com Alfred O. Hero-III Dept. of EECS, University of Michigan hero@eecs.umich.edu

More information

Structure Learning in Ergodic Factored MDPs without Knowledge of the Transition Function s In-Degree

Structure Learning in Ergodic Factored MDPs without Knowledge of the Transition Function s In-Degree Structure Learning in Ergodic Factored MDPs without Knowledge of the Transition Function s In-Degree Doran Chakraborty chakrado@cs.utexas.edu Peter Stone pstone@cs.utexas.edu Department of Computer Science,

More information

R-max A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning

R-max A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning Journal of achine Learning Research 3 (2002) 213-231 Submitted 11/01; Published 10/02 R-max A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning Ronen I. Brafman Computer Science

More information

5.1 Bipartite Matching

5.1 Bipartite Matching CS787: Advanced Algorithms Lecture 5: Applications of Network Flow In the last lecture, we looked at the problem of finding the maximum flow in a graph, and how it can be efficiently solved using the Ford-Fulkerson

More information

Lab 2: Swat ATM (Machine (Machine))

Lab 2: Swat ATM (Machine (Machine)) Lab 2: Swat ATM (Machine (Machine)) Due: February 19th at 11:59pm Overview The goal of this lab is to continue your familiarization with the C++ programming with Classes, as well as preview some data structures.

More information

. g .,, . . , Applicability of

More information

Learning Exercise Policies for American Options

Learning Exercise Policies for American Options Yuxi Li Dept. of Computing Science University of Alberta Edmonton, Alberta Canada T6G 2E8 Csaba Szepesvari Dept. of Computing Science University of Alberta Edmonton, Alberta Canada T6G 2E8 Dale Schuurmans

More information

Online Monitoring of Software System Reliability

Online Monitoring of Software System Reliability MobiLab Online Monitoring of Software System Reliability The Mobilab Research Group Computer Science and Systems department University of Naples Federico II 80125 Naples, Italy Stefano Russo The Mobilab

More information

Hierarchical Reinforcement Learning in Computer Games

Hierarchical Reinforcement Learning in Computer Games Hierarchical Reinforcement Learning in Computer Games Marc Ponsen, Pieter Spronck, Karl Tuyls Maastricht University / MICC-IKAT. {m.ponsen,p.spronck,k.tuyls}@cs.unimaas.nl Abstract. Hierarchical reinforcement

More information

customer rewards Use this guide to create customized customer rewards and redeem points earned by customers.

customer rewards Use this guide to create customized customer rewards and redeem points earned by customers. customer rewards Use this guide to create customized customer rewards and redeem points earned by customers. Setting Security 2. Click on the Security Customer Rewards Edit Ticket - Process 3. Click the

More information

Traffic Driven Analysis of Cellular Data Networks

Traffic Driven Analysis of Cellular Data Networks Traffic Driven Analysis of Cellular Data Networks Samir R. Das Computer Science Department Stony Brook University Joint work with Utpal Paul, Luis Ortiz (Stony Brook U), Milind Buddhikot, Anand Prabhu

More information

An Environment Model for N onstationary Reinforcement Learning

An Environment Model for N onstationary Reinforcement Learning An Environment Model for N onstationary Reinforcement Learning Samuel P. M. Choi Dit-Yan Yeung Nevin L. Zhang pmchoi~cs.ust.hk dyyeung~cs.ust.hk lzhang~cs.ust.hk Department of Computer Science, Hong Kong

More information

Standard Costs Overview

Standard Costs Overview Overview 1. What are standard Costs. 2. Why do we set standard costs? 3. How do we set the standards? 4. Calculating Variances: DM and DL - Disaggregating variances into price and volume. - Difference

More information

Neuro-Dynamic Programming An Overview

Neuro-Dynamic Programming An Overview 1 Neuro-Dynamic Programming An Overview Dimitri Bertsekas Dept. of Electrical Engineering and Computer Science M.I.T. September 2006 2 BELLMAN AND THE DUAL CURSES Dynamic Programming (DP) is very broadly

More information

15-150 Lecture 11: Tail Recursion; Continuations

15-150 Lecture 11: Tail Recursion; Continuations 15-150 Lecture 11: Tail Recursion; Continuations Lecture by Dan Licata February 21, 2011 In this lecture we will discuss space usage: analyzing the memory it takes your program to run tail calls and tail

More information

Flow Monitoring With Cisco Routers

Flow Monitoring With Cisco Routers CSAMP: A System for Network- Wide Flow Monitoring Vyas Sekar,Michael K. Reiter, Walter Willinger, Hui Zhang,Ramana Rao Kompella, David G. Andersen Presentation by Beletsioti Georgia Flow measurements today

More information

Regression analysis of probability-linked data

Regression analysis of probability-linked data Regression analysis of probability-linked data Ray Chambers University of Wollongong James Chipperfield Australian Bureau of Statistics Walter Davis Statistics New Zealand 1 Overview 1. Probability linkage

More information

Managing a Successful

Managing a Successful Managing a Successful Bingo Hall Program 1 Panel Dave Curial, AGLC Alex Zukowsky, AGLC Policy Analyst Canadian Gaming Summit Dennis Kronberger Hall Manager, Bingo Palace 2 Regular & Special Game Payouts

More information

The Program Data Vector As an Aid to DATA step Reasoning Marianne Whitlock, Kennett Square, PA

The Program Data Vector As an Aid to DATA step Reasoning Marianne Whitlock, Kennett Square, PA PAPER IN09_05 The Program Data Vector As an Aid to DATA step Reasoning Marianne Whitlock, Kennett Square, PA ABSTRACT The SAS DATA step is easy enough for beginners to produce results quickly. You can

More information

Successful completion of the course leads to the award of the Technical Training Solutions Competence Certificate 530 - PLC Programming

Successful completion of the course leads to the award of the Technical Training Solutions Competence Certificate 530 - PLC Programming PLC PROGRAMMING COURSE 530: 4 DAYS: Max 8 Candidates Modern PLC-based control and automation systems often have improvements and modifications made, resulting in changes needing to be made to the PLC program.

More information

Jae Won idee. School of Computer Science and Engineering Sungshin Women's University Seoul, 136-742, South Korea

Jae Won idee. School of Computer Science and Engineering Sungshin Women's University Seoul, 136-742, South Korea STOCK PRICE PREDICTION USING REINFORCEMENT LEARNING Jae Won idee School of Computer Science and Engineering Sungshin Women's University Seoul, 136-742, South Korea ABSTRACT Recently, numerous investigations

More information

Reinforcement Learning on the Combinatorial Game of Nim ERIK JÄRLEBERG

Reinforcement Learning on the Combinatorial Game of Nim ERIK JÄRLEBERG Reinforcement Learning on the Combinatorial Game of Nim ERIK JÄRLEBERG Bachelor of Science Thesis Stockholm, Sweden 2011 Reinforcement Learning on the Combinatorial Game of Nim ERIK JÄRLEBERG Bachelor

More information

Patient Education Connecting patients to the latest multimedia resources. Marra Williams, CHES

Patient Education Connecting patients to the latest multimedia resources. Marra Williams, CHES Patient Education Connecting patients to the latest multimedia resources Marra Williams, CHES Thanks for staying with us. Buh-bye! When your patient is discharged, you may be done taking care of them BUT

More information

Data Mining on Streams

Data Mining on Streams Data Mining on Streams Using Decision Trees CS 536: Machine Learning Instructor: Michael Littman TA: Yihua Wu Outline Introduction to data streams Overview of traditional DT learning ALG DT learning ALGs

More information

Options with exceptions

Options with exceptions Options with exceptions Munu Sairamesh and Balaraman Ravindran Indian Institute Of Technology Madras, India Abstract. An option is a policy fragment that represents a solution to a frequent subproblem

More information

How To Choose a Mutual Fund? A Step-by-Step Guide. Pattu freefincal.wordpress.com

How To Choose a Mutual Fund? A Step-by-Step Guide. Pattu freefincal.wordpress.com How To Choose a Mutual Fund? A Step-by-Step Guide Pattu freefincal.wordpress.com Disclaimer I am sharing with you the method I use to choose MFs. I cannot be held responsible for the decisions you make

More information

Skill Discovery in Continuous Reinforcement Learning Domains using Skill Chaining

Skill Discovery in Continuous Reinforcement Learning Domains using Skill Chaining Skill Discovery in Continuous Reinforcement Learning Domains using Skill Chaining George Konidaris Computer Science Department University of Massachusetts Amherst Amherst MA 01003 USA gdk@cs.umass.edu

More information

New Concepts in Multi-Channel Order Fulfillment

New Concepts in Multi-Channel Order Fulfillment New Concepts in Multi-Channel Order Fulfillment Presented by: Dave Simpson Sponsored by: Your Logo Goes Here! 2015 MHI Copyright claimed for audiovisual works and sound recordings of seminar sessions.

More information

Personalized Web-page Rendering System

Personalized Web-page Rendering System Personalized Web-page Rendering System P. Swapna Raj B. Ravindran Department of Computer Science and Engineering, Indian Institute of Technology Madras pswapna@cse.iitm.ac.in, ravi@cse.iitm.ac.in Abstract

More information

Learning a wall following behaviour in mobile robotics using stereo and mono vision

Learning a wall following behaviour in mobile robotics using stereo and mono vision Learning a wall following behaviour in mobile robotics using stereo and mono vision P. Quintía J.E. Domenech C.V. Regueiro C. Gamallo R. Iglesias Dpt. Electronics and Systems. Univ. A Coruña pquintia@udc.es,

More information

THE LOGIC OF ADAPTIVE BEHAVIOR

THE LOGIC OF ADAPTIVE BEHAVIOR THE LOGIC OF ADAPTIVE BEHAVIOR Knowledge Representation and Algorithms for Adaptive Sequential Decision Making under Uncertainty in First-Order and Relational Domains Martijn van Otterlo Department of

More information

Closest Pair of Points. Kleinberg and Tardos Section 5.4

Closest Pair of Points. Kleinberg and Tardos Section 5.4 Closest Pair of Points Kleinberg and Tardos Section 5.4 Closest Pair of Points Closest pair. Given n points in the plane, find a pair with smallest Euclidean distance between them. Fundamental geometric

More information

Measuring Resistance Using Digital I/O

Measuring Resistance Using Digital I/O Measuring Resistance Using Digital I/O Using a Microcontroller for Measuring Resistance Without using an ADC. Copyright 2011 John Main http://www.best-microcontroller-projects.com Page 1 of 10 Table of

More information

A Reinforcement Learning Approach for Supply Chain Management

A Reinforcement Learning Approach for Supply Chain Management A Reinforcement Learning Approach for Supply Chain Management Tim Stockheim, Michael Schwind, and Wolfgang Koenig Chair of Economics, esp. Information Systems, Frankfurt University, D-60054 Frankfurt,

More information

Lecture 8: Routing I Distance-vector Algorithms. CSE 123: Computer Networks Stefan Savage

Lecture 8: Routing I Distance-vector Algorithms. CSE 123: Computer Networks Stefan Savage Lecture 8: Routing I Distance-vector Algorithms CSE 3: Computer Networks Stefan Savage This class New topic: routing How do I get there from here? Overview Routing overview Intra vs. Inter-domain routing

More information

Joeri de Ruiter Sicco Verwer

Joeri de Ruiter Sicco Verwer Automated reverse engineering of security protocols Learning to fuzz, fuzzing to learn Fides Aarts Erik Poll Joeri de Ruiter Sicco Verwer Radboud University Nijmegen Fuzzing 1. Plain fuzzing, with long

More information

Contents About The Author...3

Contents About The Author...3 Contents About The Author...3 Introduction...4 How It Will Change Your Body Shape?...5 Metabolic Conditioning...6 Exercise Selection...7 Equipment...8 Session One...9 Session Two... 10 Session Three...

More information

Gila Shoes Exercise: IF, MAX, & MIN Step-by-Step

Gila Shoes Exercise: IF, MAX, & MIN Step-by-Step Gila Shoes Exercise: IF, MAX, & MIN Step-by-Step EXCEL REVIEW 2001-2002 Gila Shoes The Exercise Background Rob owns a Fast Feet running store in Carrboro. For a month-long promotion for his store s new

More information

Scalable Planning and Learning for Multiagent POMDPs

Scalable Planning and Learning for Multiagent POMDPs Scalable Planning and Learning for Multiagent POMDPs Christopher Amato CSAIL MIT camato@csail.mit.edu Frans A. Oliehoek Informatics Institute, University of Amsterdam DKE, Maastricht University f.a.oliehoek@uva.nl

More information

MapReduce for Parallel Reinforcement Learning

MapReduce for Parallel Reinforcement Learning MapReduce for arallel Reinforcement Learning Yuxi Li 1 and Dale Schuurmans 2 1 College of Computer Science and Engineering Univ. of Electronic Science and Technology of China Chengdu, China 2 Department

More information

Creating Algorithmic Traders with Hierarchical Reinforcement Learning MSc Dissertation. Thomas Elder s0789506

Creating Algorithmic Traders with Hierarchical Reinforcement Learning MSc Dissertation. Thomas Elder s0789506 Creating Algorithmic Traders with Hierarchical Reinforcement Learning MSc Dissertation Thomas Elder s78956 Master of Science School of Informatics University of Edinburgh 28 Abstract There has recently

More information

Paper 2917. Creating Variables: Traps and Pitfalls Olena Galligan, Clinops LLC, San Francisco, CA

Paper 2917. Creating Variables: Traps and Pitfalls Olena Galligan, Clinops LLC, San Francisco, CA Paper 2917 Creating Variables: Traps and Pitfalls Olena Galligan, Clinops LLC, San Francisco, CA ABSTRACT Creation of variables is one of the most common SAS programming tasks. However, sometimes it produces

More information

Reinforcement Learning: A Tutorial Survey and Recent Advances

Reinforcement Learning: A Tutorial Survey and Recent Advances A. Gosavi Reinforcement Learning: A Tutorial Survey and Recent Advances Abhijit Gosavi Department of Engineering Management and Systems Engineering 219 Engineering Management Missouri University of Science

More information

Automata and Formal Languages

Automata and Formal Languages Automata and Formal Languages Winter 2009-2010 Yacov Hel-Or 1 What this course is all about This course is about mathematical models of computation We ll study different machine models (finite automata,

More information

Marco Corazza and Francesco Bertoluzzo. Q-Learning-based financial trading systems with applications

Marco Corazza and Francesco Bertoluzzo. Q-Learning-based financial trading systems with applications Marco Corazza and Francesco Bertoluzzo Q-Learning-based financial trading systems with applications ISSN: 1827-3580 No. 15/WP/2014 Working Papers Department of Economics Ca Foscari University of Venice

More information

An Application of Inverse Reinforcement Learning to Medical Records of Diabetes Treatment

An Application of Inverse Reinforcement Learning to Medical Records of Diabetes Treatment An Application of Inverse Reinforcement Learning to Medical Records of Diabetes Treatment Hideki Asoh 1, Masanori Shiro 1 Shotaro Akaho 1, Toshihiro Kamishima 1, Koiti Hasida 1, Eiji Aramaki 2, and Takahide

More information

Algorithms and optimization for search engine marketing

Algorithms and optimization for search engine marketing Algorithms and optimization for search engine marketing Using portfolio optimization to achieve optimal performance of a search campaign and better forecast ROI Contents 1: The portfolio approach 3: Why

More information

CS473 - Algorithms I

CS473 - Algorithms I CS473 - Algorithms I Lecture 4 The Divide-and-Conquer Design Paradigm View in slide-show mode 1 Reminder: Merge Sort Input array A sort this half sort this half Divide Conquer merge two sorted halves Combine

More information

A Behavior Based Kernel for Policy Search via Bayesian Optimization

A Behavior Based Kernel for Policy Search via Bayesian Optimization via Bayesian Optimization Aaron Wilson WILSONAA@EECS.OREGONSTATE.EDU Alan Fern AFERN@EECS.OREGONSTATE.EDU Prasad Tadepalli TADEPALL@EECS.OREGONSTATE.EDU Oregon State University School of EECS, 1148 Kelley

More information