Model-Free vs. Model- Based RL: Q, SARSA, & E 3
|
|
- Chloe Flynn
- 7 years ago
- Views:
Transcription
1 Model-Free vs. Model- Based RL: Q, SARSA, & E 3
2 Administrivia Reminder: Office hours tomorrow truncated 9:00-10:15 AM Can schedule other times if necessary Final projects Final presentations Dec 2, 7, 9 20 min (max) presentations 3 or 4 per day Sign up for presentation slots today!
3 The Q-learning algorithm Algorithm: Q_learn Inputs: State space S; Act. space A Discount γ (0<=γ<1); Learning rate α (0<=α<1) Outputs: Q Repeat { s=get_current_world_state() a=pick_next_action(q,s) (r,s )=act_in_world(a) Q(s,a)=Q(s,a)+α*(r+γ*max_a (Q(s,a ))-Q(s,a)) } Until (bored)
4 SARSA-learning algorithm Algorithm: SARSA_learn Inputs: State space S; Act. space A Discount γ (0<=γ<1); Learning rate α (0<=α<1) Outputs: Q s=get_current_world_state() a=pick_next_action(q,s) Repeat { (r,s )=act_in_world(a) a =pick_next_action(q,s ) Q(s,a)=Q(s,a)+α*(r+γ*Q(s,a )-Q(s,a)) a=a ; s=s ; } Until (bored)
5 SARSA vs. Q SARSA and Q-learning very similar SARSA updates Q(s,a) for the policy it s actually executing Lets the pick_next_action() function pick action to update Q updates Q(s,a) for greedy policy w.r.t. current Q Uses max_a to pick action to update might be diff than the action it executes at s In practice: Q will learn the true π*, but SARSA will learn about what it s actually doing Exploration can get Q-learning in trouble...
6 Radioactive breadcrumbs Can now define eligibility traces for SARSA In addition to Q(s,a) table, keep an e(s,a) table Records eligibility (real number) for each state/ action pair At every step ((s,a,r,s,a ) tuple): Increment e(s,a) for current (s,a) pair by 1 Update all Q(s,a ) vals in proportion to their e(s,a ) Decay all e(s,a ) by factor of λγ Leslie Kaelbling calls this the radioactive breadcrumbs form of RL
7 SARSA(λ)-learning alg. Algorithm: SARSA(λ)_learn Inputs: S, A, γ (0<=γ<1), α (0<=α<1), λ (0<=λ<1) Outputs: Q e(s,a)=0 // for all s, a s=get_curr_world_st(); a=pick_nxt_act(q,s); Repeat { (r,s )=act_in_world(a) a =pick_next_action(q,s ) δ=r+γ*q(s,a )-Q(s,a) e(s,a)+=1 foreach (s,a ) pair in (SXA) { Q(s,a )=Q(s,a )+α*e(s,a )*δ e(s,a )*=λγ } a=a ; s=s ; } Until (bored)
8 The trail of crumbs Path taken Sutton & Barto, Sec 7.5
9 The trail of crumbs Action values increased by one-step Sarsa λ=0 Sutton & Barto, Sec 7.5
10 The trail of crumbs Action values increased by Sarsa(!) with!=0.9 Sutton & Barto, Sec 7.5
11 Eligibility for a single state e(s i,a j ) accumulating eligibility trace times of visits to a state 1st visit 2nd visit... Sutton & Barto, Sec 7.5
12 Eligibility trace followup Eligibility trace allows: Tracking where the agent has been Backup of rewards over longer periods Credit assignment: state/action pairs rewarded for having contributed to getting to the reward Why does it work?
13 The forward view of elig. Original SARSA did one step backup: Q(s, a) (1 α)q(s, a) + α 1 Q(s, a) 1 Q(s, a) (r t + γq(s t+1, a t+1 )) Info backup Rest of trajectory Q(s,a) r t Q(s t+1,a t+1 )
14 The forward view of elig. Original SARSA did one step backup: Q(s, a) (1 α)q(s, a) + α 1 Q(s, a) 1 Q(s, a) (r t + γq(s t+1, a t+1 )) Could also do a two step backup : Q(s, a) (1 α)q(s, a) + α 2 Q(s, a) 2 Q(s, a) (r t + γr t+1 + γ 2 Q(s t+2, a t+2 )) Info backup Rest of trajectory Q(s,a) r t+1 r t Q(s t+2,a t+2 )
15 The forward view of elig. Original SARSA did one step backup: Q(s, a) (1 α)q(s, a) + α 1 Q(s, a) 1 Q(s, a) (r t + γq(s t+1, a t+1 )) Could also do a two step backup : Q(s, a) (1 α)q(s, a) + α 2 Q(s, a) 2 Q(s, a) (r t + γr t+1 + γ 2 Q(s t+2, a t+2 )) Or even an n step backup : Q(s, a) ((1 α)q(s, a) + α n Q(s, a) n 1 n Q(s, a) i=0 γ i r t+i ) + γ n Q(s t+n, a t+n ))
16 The forward view of elig. Small-step backups (n=1, n=2, etc.) are slow and nearsighted Large-step backups (n=100, n=1000, n= ) are expensive and may miss near-term effects Want a way to combine them Can take a weighted average of different backups E.g.: Q(s, a) (1 α)q(s, a)+ α ( 1 3 2Q(s, a) Q(s, a) )
17 The forward view of elig. Q(s, a) (1 α)q(s, a)+ α ( 1 3 2Q(s, a) Q(s, a) ) 1/3 2/3
18 The forward view of elig. How do you know which number of steps to avg over? And what the weights should be? Accumulating eligibility traces are just a clever way to easily avg. over all n: Q(s, a) (1 α)q(s, a)+ ( ) α (1 λ) λ i 1 i Q(s, a) i=1
19 The forward view of elig. λ 0 λ 1 λ 2 ( α (1 λ) i=1 λ i 1 i Q(s, a) ) λ n-1
20 Replacing traces Kind just described are accumulating e-traces Every time you go back to state, add extra e. There are also replacing eligibility traces Every time you go back to a state/action, reset e(s,a) to 1 Works better sometimes times of state visits accumulating trace replacing trace Sutton & Barto, Sec 7.8
21 Model-free vs. Model-based
22 What do you know? Both Q-learning and SARSA(λ) are model free methods A.k.a., value-based methods Learn a Q function Never learn T or R explicitly At the end of learning, agent knows how to act, but doesn t explicitly know anything about the environment Also, no guarantees about explore/exploit tradeoff Sometimes, want one or both of the above
23 Model-based methods Model based methods, OTOH, do explicitly learn T & R At end of learning, have entire M= S,A,T,R Also have π* At least one model-based method also guarantees explore/exploit tradeoff properties
24 E 3 Efficient Explore & Exploit algorithm Kearns & Singh, Machine Learning 49, 2002 Explicitly keeps a T matrix and a R table Plan (policy iter) w/ curr. T & R -> curr. π Every state/action entry in T and R: Can be marked known or unknown Has a #visits counter, nv(s,a) After every s,a,r,s tuple, update T & R (running average) When nv(s,a)>nvthresh, mark cell as known & re-plan When all states known, done learning & have π*
25 The E 3 algorithm Algorithm: E3_learn_sketch // only an overview Inputs: S, A, γ (0<=γ<1), NVthresh, R max, Var max Outputs: T, R, π* Initialization: R(s)=R max // for all s T(s,a,s )=1/ S // for all s,a,s known(s,a)=0; nv(s,a)=0; // for all s, a π=policy_iter(s,a,t,r)
26 The E 3 algorithm Algorithm: E3_learn_sketch // con t Repeat { s=get_current_world_state() a=π(s) (r,s )=act_in_world(a) T(s,a,s )=(1+T(s,a,s )*nv(s,a))/(nv(s,a)+1) nv(s,a)++; if (nv(s,a)>nvthresh) { known(s,a)=1; π=policy_iter(s,a,t,r) } } Until (all (s,a) known)
Eligibility Traces. Suggested reading: Contents: Chapter 7 in R. S. Sutton, A. G. Barto: Reinforcement Learning: An Introduction MIT Press, 1998.
Eligibility Traces 0 Eligibility Traces Suggested reading: Chapter 7 in R. S. Sutton, A. G. Barto: Reinforcement Learning: An Introduction MIT Press, 1998. Eligibility Traces Eligibility Traces 1 Contents:
More informationA Sarsa based Autonomous Stock Trading Agent
A Sarsa based Autonomous Stock Trading Agent Achal Augustine The University of Texas at Austin Department of Computer Science Austin, TX 78712 USA achal@cs.utexas.edu Abstract This paper describes an autonomous
More informationLecture 5: Model-Free Control
Lecture 5: Model-Free Control David Silver Outline 1 Introduction 2 On-Policy Monte-Carlo Control 3 On-Policy Temporal-Difference Learning 4 Off-Policy Learning 5 Summary Introduction Model-Free Reinforcement
More informationPreliminaries: Problem Definition Agent model, POMDP, Bayesian RL
POMDP Tutorial Preliminaries: Problem Definition Agent model, POMDP, Bayesian RL Observation Belief ACTOR Transition Dynamics WORLD b Policy π Action Markov Decision Process -X: set of states [x s,x r
More informationMotivation. Motivation. Can a software agent learn to play Backgammon by itself? Machine Learning. Reinforcement Learning
Motivation Machine Learning Can a software agent learn to play Backgammon by itself? Reinforcement Learning Prof. Dr. Martin Riedmiller AG Maschinelles Lernen und Natürlichsprachliche Systeme Institut
More informationUsing Markov Decision Processes to Solve a Portfolio Allocation Problem
Using Markov Decision Processes to Solve a Portfolio Allocation Problem Daniel Bookstaber April 26, 2005 Contents 1 Introduction 3 2 Defining the Model 4 2.1 The Stochastic Model for a Single Asset.........................
More informationA Multi-agent Q-learning Framework for Optimizing Stock Trading Systems
A Multi-agent Q-learning Framework for Optimizing Stock Trading Systems Jae Won Lee 1 and Jangmin O 2 1 School of Computer Science and Engineering, Sungshin Women s University, Seoul, Korea 136-742 jwlee@cs.sungshin.ac.kr
More informationLecture 1: Introduction to Reinforcement Learning
Lecture 1: Introduction to Reinforcement Learning David Silver Outline 1 Admin 2 About Reinforcement Learning 3 The Reinforcement Learning Problem 4 Inside An RL Agent 5 Problems within Reinforcement Learning
More informationŹ Ź ł ź Ź ś ź ł ź Ś ę ż ż ł ż ż Ż Ś ę Ż Ż ę ś ź ł Ź ł ł ż ż ź ż ż Ś ę ż ż Ź Ł Ż Ż Ą ż ż ę ź Ń Ź ś ł ź ż ł ś ź ź Ą ć ś ś Ź Ś ę ę ć ż Ź Ą Ń Ą ł ć ć ł ł ź ę Ś ę ś ę ł ś ć ź ś ł ś ł ł ł ł ć ć Ś ł ź Ś ł
More informationLearning the Structure of Factored Markov Decision Processes in Reinforcement Learning Problems
Learning the Structure of Factored Markov Decision Processes in Reinforcement Learning Problems Thomas Degris Thomas.Degris@lip6.fr Olivier Sigaud Olivier.Sigaud@lip6.fr Pierre-Henri Wuillemin Pierre-Henri.Wuillemin@lip6.fr
More informationNEURAL NETWORKS AND REINFORCEMENT LEARNING. Abhijit Gosavi
NEURAL NETWORKS AND REINFORCEMENT LEARNING Abhijit Gosavi Department of Engineering Management and Systems Engineering Missouri University of Science and Technology Rolla, MO 65409 1 Outline A Quick Introduction
More informationReinforcement Learning with Factored States and Actions
Journal of Machine Learning Research 5 (2004) 1063 1088 Submitted 3/02; Revised 1/04; Published 8/04 Reinforcement Learning with Factored States and Actions Brian Sallans Austrian Research Institute for
More informationInductive QoS Packet Scheduling for Adaptive Dynamic Networks
Inductive QoS Packet Scheduling for Adaptive Dynamic Networks Malika BOURENANE Dept of Computer Science University of Es-Senia Algeria mb_regina@yahoo.fr Abdelhamid MELLOUK LISSI Laboratory University
More informationChi-square Tests Driven Method for Learning the Structure of Factored MDPs
Chi-square Tests Driven Method for Learning the Structure of Factored MDPs Thomas Degris Thomas.Degris@lip6.fr Olivier Sigaud Olivier.Sigaud@lip6.fr Pierre-Henri Wuillemin Pierre-Henri.Wuillemin@lip6.fr
More informationModel-based reinforcement learning with nearly tight exploration complexity bounds
Model-based reinforcement learning with nearly tight exploration complexity bounds István Szita Csaba Szepesvári University of Alberta, Athabasca Hall, Edmonton, AB T6G 2E8 Canada szityu@gmail.com szepesva@cs.ualberta.ca
More informationGoal-Directed Online Learning of Predictive Models
Goal-Directed Online Learning of Predictive Models Sylvie C.W. Ong, Yuri Grinberg, and Joelle Pineau McGill University, Montreal, Canada Abstract. We present an algorithmic approach for integrated learning
More informationUFSJ2D (UaiSoccer2D + RoboCap): Team Description Paper Robocup 2014
UFSJ2D (UaiSoccer2D + RoboCap): Team Description Paper Robocup 2014 Andre Luiz C. Ottoni, Heitor Magno R. Junior, Itallo G. Machado, Lara T. Cordeiro, Erivelton G. Nepomuceno, Eduardo B. Pereira, Rone
More informationWhere Do Rewards Come From?
Where Do Rewards Come From? Satinder Singh baveja@umich.edu Computer Science & Engineering University of Michigan, Ann Arbor Richard L. Lewis rickl@umich.edu Department of Psychology University of Michigan,
More informationUrban Traffic Control Based on Learning Agents
Urban Traffic Control Based on Learning Agents Pierre-Luc Grégoire, Charles Desjardins, Julien Laumônier and Brahim Chaib-draa DAMAS Laboratory, Computer Science and Software Engineering Department, Laval
More informationGoal-Directed Online Learning of Predictive Models
Goal-Directed Online Learning of Predictive Models Sylvie C.W. Ong, Yuri Grinberg, and Joelle Pineau School of Computer Science McGill University, Montreal, Canada Abstract. We present an algorithmic approach
More informationReinforcement Learning for Resource Allocation and Time Series Tools for Mobility Prediction
Reinforcement Learning for Resource Allocation and Time Series Tools for Mobility Prediction Baptiste Lefebvre 1,2, Stephane Senecal 2 and Jean-Marc Kelif 2 1 École Normale Supérieure (ENS), Paris, France,
More informationReinforcement Learning of Task Plans for Real Robot Systems
Reinforcement Learning of Task Plans for Real Robot Systems Pedro Tomás Mendes Resende pedro.resende@ist.utl.pt Instituto Superior Técnico, Lisboa, Portugal October 2014 Abstract This paper is the extended
More informationIntroduction. Premise and Hypothesis. Methodology
Sustained Teacher/Parent Communication, and Student Achievement Kevin O. Cox 6 th Grade Social Studies (US History) Teacher Gunston Middle School Arlington County (VA) Public Schools Submitted June 2002
More informationAssignment #1. Example: tetris state: board configuration + shape of the falling piece ~2 200 states! Recap RL so far. Page 1
Generalization and function approximation CS 287: Advanced Robotics Fall 2009 Lecture 14: Reinforcement Learning with Function Approximation and TD Gammon case study Pieter Abbeel UC Berkeley EECS Represent
More informationReducing Operational Costs in Cloud Social TV: An Opportunity for Cloud Cloning
IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 16, NO. 6, OCTOBER 2014 1739 Reducing Operational Costs in Cloud Social TV: An Opportunity for Cloud Cloning Yichao Jin, Yonggang Wen, Senior Member, IEEE, Han Hu,
More informationReinforcement Learning
Reinforcement Learning LU 2 - Markov Decision Problems and Dynamic Programming Dr. Martin Lauer AG Maschinelles Lernen und Natürlichsprachliche Systeme Albert-Ludwigs-Universität Freiburg martin.lauer@kit.edu
More informationFigure 1: Cost and Speed of Access of different storage components. Page 30
Reinforcement Learning Approach for Data Migration in Hierarchical Storage Systems T.G. Lakshmi, R.R. Sedamkar, Harshali Patil Department of Computer Engineering, Thakur College of Engineering and Technology,
More informationOnline Learning and Exploiting Relational Models in Reinforcement Learning
Online Learning and Exploiting Relational Models in Reinforcement Learning Tom Croonenborghs, Jan Ramon, Hendrik Blockeel and Maurice Bruyoghe K.U.Leuven, Dept. of Computer Science Celestijnenlaan 200A,
More informationDistributed Reinforcement Learning for Network Intrusion Response
Distributed Reinforcement Learning for Network Intrusion Response KLEANTHIS MALIALIS Doctor of Philosophy UNIVERSITY OF YORK COMPUTER SCIENCE September 2014 For my late grandmother Iroulla Abstract The
More informationNeural Fitted Q Iteration - First Experiences with a Data Efficient Neural Reinforcement Learning Method
Neural Fitted Q Iteration - First Experiences with a Data Efficient Neural Reinforcement Learning Method Martin Riedmiller Neuroinformatics Group, University of Onsabrück, 49078 Osnabrück Abstract. This
More information(Refer Slide Time: 00:01:16 min)
Digital Computer Organization Prof. P. K. Biswas Department of Electronic & Electrical Communication Engineering Indian Institute of Technology, Kharagpur Lecture No. # 04 CPU Design: Tirning & Control
More informationReinforcement Learning
Reinforcement Learning LU 2 - Markov Decision Problems and Dynamic Programming Dr. Joschka Bödecker AG Maschinelles Lernen und Natürlichsprachliche Systeme Albert-Ludwigs-Universität Freiburg jboedeck@informatik.uni-freiburg.de
More informationTRACS Main Menu. >> T R A C S << All Rights Reserved Tire Management System Copyright (c) 1984,1994 Version 9.0 Compu-Power, Inc.
INVENTORY MANAGEMENT Chapter 1: System Control Master File TRACS Main Menu >> T R A C S
More information1 SRS Advanced Customer Loyalty
1 SRS Advanced Customer Loyalty SRS Release 2 has the capability to execute offers for targeted customers and / or items with a rules-based logic (quantity and value brackets, item combinations etc.) and
More informationA Reinforcement Learning Framework For Dynamic Power Management of a Portable, Multi-Camera Traffic Monitoring System
A Reinforcement Learning Framework For Dynamic Power Management of a Portable, Multi-Camera Traffic Monitoring System Umair Ali Khan, Bernhard Rinner Institute of Networked and Embedded Systems Alpen-Adria
More informationAn Early Attempt at Applying Deep Reinforcement Learning to the Game 2048
An Early Attempt at Applying Deep Reinforcement Learning to the Game 2048 Hong Gui, Tinghan Wei, Ching-Bo Huang, I-Chen Wu 1 1 Department of Computer Science, National Chiao Tung University, Hsinchu, Taiwan
More informationLoyalty Codes. Customers: Getting Started. Overview. Introduction to loyalty programs
1 Customers: Getting Started Loyalty Codes Overview Customer loyalty ( frequent buyer ) programs allow you to reward customers for their business and encourage customers to purchase more frequently in
More informationAlgorithms for Reinforcement Learning
Algorithms for Reinforcement Learning Draft of the lecture published in the Synthesis Lectures on Artificial Intelligence and Machine Learning series by Morgan & Claypool Publishers Csaba Szepesvári June
More informationSYSM 6304: Risk and Decision Analysis Lecture 5: Methods of Risk Analysis
SYSM 6304: Risk and Decision Analysis Lecture 5: Methods of Risk Analysis M. Vidyasagar Cecil & Ida Green Chair The University of Texas at Dallas Email: M.Vidyasagar@utdallas.edu October 17, 2015 Outline
More informationMapReduce. MapReduce and SQL Injections. CS 3200 Final Lecture. Introduction. MapReduce. Programming Model. Example
MapReduce MapReduce and SQL Injections CS 3200 Final Lecture Jeffrey Dean and Sanjay Ghemawat. MapReduce: Simplified Data Processing on Large Clusters. OSDI'04: Sixth Symposium on Operating System Design
More informationHow To Maximize Admission Control On A Network
Optimizing admission control while ensuring quality of service in multimedia networks via reinforcement learning* Timothy X Brown t, Hui Tong t, Satinder Singh+ t Electrical and Computer Engineering +
More informationDynamic Programming. Lecture 11. 11.1 Overview. 11.2 Introduction
Lecture 11 Dynamic Programming 11.1 Overview Dynamic Programming is a powerful technique that allows one to solve many different types of problems in time O(n 2 ) or O(n 3 ) for which a naive approach
More informationVLIW Processors. VLIW Processors
1 VLIW Processors VLIW ( very long instruction word ) processors instructions are scheduled by the compiler a fixed number of operations are formatted as one big instruction (called a bundle) usually LIW
More informationIN AVIATION it regularly occurs that an airplane encounters. Guaranteed globally optimal continuous reinforcement learning.
GUARANTEED GLOBALLY OPTIMAL CONTINUOUS REINFORCEMENT LEARNING 1 Guaranteed globally optimal continuous reinforcement learning Hildo Bijl Abstract Self-learning and adaptable autopilots have the potential
More informationTD(0) Leads to Better Policies than Approximate Value Iteration
TD(0) Leads to Better Policies than Approximate Value Iteration Benjamin Van Roy Management Science and Engineering and Electrical Engineering Stanford University Stanford, CA 94305 bvr@stanford.edu Abstract
More information14.1 Rent-or-buy problem
CS787: Advanced Algorithms Lecture 14: Online algorithms We now shift focus to a different kind of algorithmic problem where we need to perform some optimization without knowing the input in advance. Algorithms
More informationDemonstrating a DATA Step with and without a RETAIN Statement
1 The RETAIN Statement Introduction 1 Demonstrating a DATA Step with and without a RETAIN Statement 1 Generating Sequential SUBJECT Numbers Using a Retained Variable 7 Using a SUM Statement to Create SUBJECT
More informationBeyond Reward: The Problem of Knowledge and Data
Beyond Reward: The Problem of Knowledge and Data Richard S. Sutton University of Alberta Edmonton, Alberta, Canada Intelligence can be defined, informally, as knowing a lot and being able to use that knowledge
More informationMAXIMUS Telephone Enrollment- Phase I Call Center Script
Hello, is (First Name) there? (Full Name)? Hi, Mr./Ms. (Last Name). This is (Representative Name) calling from California Health Care Options. I m calling about your Medi-Cal benefits. Did you get the
More informationDiversity and Degrees of Freedom in Wireless Communications
1 Diversity and Degrees of Freedom in Wireless Communications Mahesh Godavarti Altra Broadband Inc., godavarti@altrabroadband.com Alfred O. Hero-III Dept. of EECS, University of Michigan hero@eecs.umich.edu
More informationStructure Learning in Ergodic Factored MDPs without Knowledge of the Transition Function s In-Degree
Structure Learning in Ergodic Factored MDPs without Knowledge of the Transition Function s In-Degree Doran Chakraborty chakrado@cs.utexas.edu Peter Stone pstone@cs.utexas.edu Department of Computer Science,
More informationR-max A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning
Journal of achine Learning Research 3 (2002) 213-231 Submitted 11/01; Published 10/02 R-max A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning Ronen I. Brafman Computer Science
More information5.1 Bipartite Matching
CS787: Advanced Algorithms Lecture 5: Applications of Network Flow In the last lecture, we looked at the problem of finding the maximum flow in a graph, and how it can be efficiently solved using the Ford-Fulkerson
More informationLab 2: Swat ATM (Machine (Machine))
Lab 2: Swat ATM (Machine (Machine)) Due: February 19th at 11:59pm Overview The goal of this lab is to continue your familiarization with the C++ programming with Classes, as well as preview some data structures.
More informationLearning Exercise Policies for American Options
Yuxi Li Dept. of Computing Science University of Alberta Edmonton, Alberta Canada T6G 2E8 Csaba Szepesvari Dept. of Computing Science University of Alberta Edmonton, Alberta Canada T6G 2E8 Dale Schuurmans
More informationOnline Monitoring of Software System Reliability
MobiLab Online Monitoring of Software System Reliability The Mobilab Research Group Computer Science and Systems department University of Naples Federico II 80125 Naples, Italy Stefano Russo The Mobilab
More informationHierarchical Reinforcement Learning in Computer Games
Hierarchical Reinforcement Learning in Computer Games Marc Ponsen, Pieter Spronck, Karl Tuyls Maastricht University / MICC-IKAT. {m.ponsen,p.spronck,k.tuyls}@cs.unimaas.nl Abstract. Hierarchical reinforcement
More informationcustomer rewards Use this guide to create customized customer rewards and redeem points earned by customers.
customer rewards Use this guide to create customized customer rewards and redeem points earned by customers. Setting Security 2. Click on the Security Customer Rewards Edit Ticket - Process 3. Click the
More informationTraffic Driven Analysis of Cellular Data Networks
Traffic Driven Analysis of Cellular Data Networks Samir R. Das Computer Science Department Stony Brook University Joint work with Utpal Paul, Luis Ortiz (Stony Brook U), Milind Buddhikot, Anand Prabhu
More informationAn Environment Model for N onstationary Reinforcement Learning
An Environment Model for N onstationary Reinforcement Learning Samuel P. M. Choi Dit-Yan Yeung Nevin L. Zhang pmchoi~cs.ust.hk dyyeung~cs.ust.hk lzhang~cs.ust.hk Department of Computer Science, Hong Kong
More informationStandard Costs Overview
Overview 1. What are standard Costs. 2. Why do we set standard costs? 3. How do we set the standards? 4. Calculating Variances: DM and DL - Disaggregating variances into price and volume. - Difference
More informationNeuro-Dynamic Programming An Overview
1 Neuro-Dynamic Programming An Overview Dimitri Bertsekas Dept. of Electrical Engineering and Computer Science M.I.T. September 2006 2 BELLMAN AND THE DUAL CURSES Dynamic Programming (DP) is very broadly
More information15-150 Lecture 11: Tail Recursion; Continuations
15-150 Lecture 11: Tail Recursion; Continuations Lecture by Dan Licata February 21, 2011 In this lecture we will discuss space usage: analyzing the memory it takes your program to run tail calls and tail
More informationFlow Monitoring With Cisco Routers
CSAMP: A System for Network- Wide Flow Monitoring Vyas Sekar,Michael K. Reiter, Walter Willinger, Hui Zhang,Ramana Rao Kompella, David G. Andersen Presentation by Beletsioti Georgia Flow measurements today
More informationRegression analysis of probability-linked data
Regression analysis of probability-linked data Ray Chambers University of Wollongong James Chipperfield Australian Bureau of Statistics Walter Davis Statistics New Zealand 1 Overview 1. Probability linkage
More informationManaging a Successful
Managing a Successful Bingo Hall Program 1 Panel Dave Curial, AGLC Alex Zukowsky, AGLC Policy Analyst Canadian Gaming Summit Dennis Kronberger Hall Manager, Bingo Palace 2 Regular & Special Game Payouts
More informationThe Program Data Vector As an Aid to DATA step Reasoning Marianne Whitlock, Kennett Square, PA
PAPER IN09_05 The Program Data Vector As an Aid to DATA step Reasoning Marianne Whitlock, Kennett Square, PA ABSTRACT The SAS DATA step is easy enough for beginners to produce results quickly. You can
More informationSuccessful completion of the course leads to the award of the Technical Training Solutions Competence Certificate 530 - PLC Programming
PLC PROGRAMMING COURSE 530: 4 DAYS: Max 8 Candidates Modern PLC-based control and automation systems often have improvements and modifications made, resulting in changes needing to be made to the PLC program.
More informationJae Won idee. School of Computer Science and Engineering Sungshin Women's University Seoul, 136-742, South Korea
STOCK PRICE PREDICTION USING REINFORCEMENT LEARNING Jae Won idee School of Computer Science and Engineering Sungshin Women's University Seoul, 136-742, South Korea ABSTRACT Recently, numerous investigations
More informationReinforcement Learning on the Combinatorial Game of Nim ERIK JÄRLEBERG
Reinforcement Learning on the Combinatorial Game of Nim ERIK JÄRLEBERG Bachelor of Science Thesis Stockholm, Sweden 2011 Reinforcement Learning on the Combinatorial Game of Nim ERIK JÄRLEBERG Bachelor
More informationPatient Education Connecting patients to the latest multimedia resources. Marra Williams, CHES
Patient Education Connecting patients to the latest multimedia resources Marra Williams, CHES Thanks for staying with us. Buh-bye! When your patient is discharged, you may be done taking care of them BUT
More informationData Mining on Streams
Data Mining on Streams Using Decision Trees CS 536: Machine Learning Instructor: Michael Littman TA: Yihua Wu Outline Introduction to data streams Overview of traditional DT learning ALG DT learning ALGs
More informationOptions with exceptions
Options with exceptions Munu Sairamesh and Balaraman Ravindran Indian Institute Of Technology Madras, India Abstract. An option is a policy fragment that represents a solution to a frequent subproblem
More informationHow To Choose a Mutual Fund? A Step-by-Step Guide. Pattu freefincal.wordpress.com
How To Choose a Mutual Fund? A Step-by-Step Guide Pattu freefincal.wordpress.com Disclaimer I am sharing with you the method I use to choose MFs. I cannot be held responsible for the decisions you make
More informationSkill Discovery in Continuous Reinforcement Learning Domains using Skill Chaining
Skill Discovery in Continuous Reinforcement Learning Domains using Skill Chaining George Konidaris Computer Science Department University of Massachusetts Amherst Amherst MA 01003 USA gdk@cs.umass.edu
More informationNew Concepts in Multi-Channel Order Fulfillment
New Concepts in Multi-Channel Order Fulfillment Presented by: Dave Simpson Sponsored by: Your Logo Goes Here! 2015 MHI Copyright claimed for audiovisual works and sound recordings of seminar sessions.
More informationPersonalized Web-page Rendering System
Personalized Web-page Rendering System P. Swapna Raj B. Ravindran Department of Computer Science and Engineering, Indian Institute of Technology Madras pswapna@cse.iitm.ac.in, ravi@cse.iitm.ac.in Abstract
More informationLearning a wall following behaviour in mobile robotics using stereo and mono vision
Learning a wall following behaviour in mobile robotics using stereo and mono vision P. Quintía J.E. Domenech C.V. Regueiro C. Gamallo R. Iglesias Dpt. Electronics and Systems. Univ. A Coruña pquintia@udc.es,
More informationTHE LOGIC OF ADAPTIVE BEHAVIOR
THE LOGIC OF ADAPTIVE BEHAVIOR Knowledge Representation and Algorithms for Adaptive Sequential Decision Making under Uncertainty in First-Order and Relational Domains Martijn van Otterlo Department of
More informationClosest Pair of Points. Kleinberg and Tardos Section 5.4
Closest Pair of Points Kleinberg and Tardos Section 5.4 Closest Pair of Points Closest pair. Given n points in the plane, find a pair with smallest Euclidean distance between them. Fundamental geometric
More informationMeasuring Resistance Using Digital I/O
Measuring Resistance Using Digital I/O Using a Microcontroller for Measuring Resistance Without using an ADC. Copyright 2011 John Main http://www.best-microcontroller-projects.com Page 1 of 10 Table of
More informationA Reinforcement Learning Approach for Supply Chain Management
A Reinforcement Learning Approach for Supply Chain Management Tim Stockheim, Michael Schwind, and Wolfgang Koenig Chair of Economics, esp. Information Systems, Frankfurt University, D-60054 Frankfurt,
More informationLecture 8: Routing I Distance-vector Algorithms. CSE 123: Computer Networks Stefan Savage
Lecture 8: Routing I Distance-vector Algorithms CSE 3: Computer Networks Stefan Savage This class New topic: routing How do I get there from here? Overview Routing overview Intra vs. Inter-domain routing
More informationJoeri de Ruiter Sicco Verwer
Automated reverse engineering of security protocols Learning to fuzz, fuzzing to learn Fides Aarts Erik Poll Joeri de Ruiter Sicco Verwer Radboud University Nijmegen Fuzzing 1. Plain fuzzing, with long
More informationContents About The Author...3
Contents About The Author...3 Introduction...4 How It Will Change Your Body Shape?...5 Metabolic Conditioning...6 Exercise Selection...7 Equipment...8 Session One...9 Session Two... 10 Session Three...
More informationGila Shoes Exercise: IF, MAX, & MIN Step-by-Step
Gila Shoes Exercise: IF, MAX, & MIN Step-by-Step EXCEL REVIEW 2001-2002 Gila Shoes The Exercise Background Rob owns a Fast Feet running store in Carrboro. For a month-long promotion for his store s new
More informationScalable Planning and Learning for Multiagent POMDPs
Scalable Planning and Learning for Multiagent POMDPs Christopher Amato CSAIL MIT camato@csail.mit.edu Frans A. Oliehoek Informatics Institute, University of Amsterdam DKE, Maastricht University f.a.oliehoek@uva.nl
More informationMapReduce for Parallel Reinforcement Learning
MapReduce for arallel Reinforcement Learning Yuxi Li 1 and Dale Schuurmans 2 1 College of Computer Science and Engineering Univ. of Electronic Science and Technology of China Chengdu, China 2 Department
More informationCreating Algorithmic Traders with Hierarchical Reinforcement Learning MSc Dissertation. Thomas Elder s0789506
Creating Algorithmic Traders with Hierarchical Reinforcement Learning MSc Dissertation Thomas Elder s78956 Master of Science School of Informatics University of Edinburgh 28 Abstract There has recently
More informationPaper 2917. Creating Variables: Traps and Pitfalls Olena Galligan, Clinops LLC, San Francisco, CA
Paper 2917 Creating Variables: Traps and Pitfalls Olena Galligan, Clinops LLC, San Francisco, CA ABSTRACT Creation of variables is one of the most common SAS programming tasks. However, sometimes it produces
More informationReinforcement Learning: A Tutorial Survey and Recent Advances
A. Gosavi Reinforcement Learning: A Tutorial Survey and Recent Advances Abhijit Gosavi Department of Engineering Management and Systems Engineering 219 Engineering Management Missouri University of Science
More informationAutomata and Formal Languages
Automata and Formal Languages Winter 2009-2010 Yacov Hel-Or 1 What this course is all about This course is about mathematical models of computation We ll study different machine models (finite automata,
More informationMarco Corazza and Francesco Bertoluzzo. Q-Learning-based financial trading systems with applications
Marco Corazza and Francesco Bertoluzzo Q-Learning-based financial trading systems with applications ISSN: 1827-3580 No. 15/WP/2014 Working Papers Department of Economics Ca Foscari University of Venice
More informationAn Application of Inverse Reinforcement Learning to Medical Records of Diabetes Treatment
An Application of Inverse Reinforcement Learning to Medical Records of Diabetes Treatment Hideki Asoh 1, Masanori Shiro 1 Shotaro Akaho 1, Toshihiro Kamishima 1, Koiti Hasida 1, Eiji Aramaki 2, and Takahide
More informationAlgorithms and optimization for search engine marketing
Algorithms and optimization for search engine marketing Using portfolio optimization to achieve optimal performance of a search campaign and better forecast ROI Contents 1: The portfolio approach 3: Why
More informationCS473 - Algorithms I
CS473 - Algorithms I Lecture 4 The Divide-and-Conquer Design Paradigm View in slide-show mode 1 Reminder: Merge Sort Input array A sort this half sort this half Divide Conquer merge two sorted halves Combine
More informationA Behavior Based Kernel for Policy Search via Bayesian Optimization
via Bayesian Optimization Aaron Wilson WILSONAA@EECS.OREGONSTATE.EDU Alan Fern AFERN@EECS.OREGONSTATE.EDU Prasad Tadepalli TADEPALL@EECS.OREGONSTATE.EDU Oregon State University School of EECS, 1148 Kelley
More information