A Framework for Simulation-Based Online Planning

Similar documents
Adversarial Hierarchical-Task Network Planning for Complex Real-Time Games

Feature Selection with Monte-Carlo Tree Search

Video Game Description Language (VGDL) and the Challenge of Creating Agents for General Video Game Playing (GVGP)

Prof. Olivier de Weck

Big Data - Lecture 1 Optimization reminders

Machine Learning for Medical Image Analysis. A. Criminisi & the InnerEye MSRC

Tracking of Small Unmanned Aerial Vehicles

Ary, a general game playing program

Learning Tetris Using the Noisy Cross-Entropy Method

THE LOGIC OF ADAPTIVE BEHAVIOR

POMPDs Make Better Hackers: Accounting for Uncertainty in Penetration Testing. By: Chris Abbott

Schedule Risk Analysis Simulator using Beta Distribution

The Big Data methodology in computer vision systems

Course Outline Department of Computing Science Faculty of Science. COMP Applied Artificial Intelligence (3,1,0) Fall 2015

Online Optimization and Personalization of Teaching Sequences

A Sarsa based Autonomous Stock Trading Agent

Roulette Wheel Selection Game Player

Data Isn't Everything

Parallel Computing for Option Pricing Based on the Backward Stochastic Differential Equation

ICT Perspectives on Big Data: Well Sorted Materials

SYSM 6304: Risk and Decision Analysis Lecture 5: Methods of Risk Analysis

Measuring the Performance of an Agent

Agents: Rationality (2)

Umbrella: A New Component-Based Software Development Model

International Journal of World Research, Vol: I Issue XIII, December 2008, Print ISSN: X DATA MINING TECHNIQUES AND STOCK MARKET

AI: A Modern Approach, Chpts. 3-4 Russell and Norvig

Machine Learning: Overview

Chapter 11 Monte Carlo Simulation

PARALLEL PROGRAMMING

Alternative Job-Shop Scheduling For Proton Therapy

Addressing Uncertainties in Design Inputs: A Case Study of Probabilistic Settlement Evaluations for Soft Zone Collapse at SWPF

Collaborative Filtering. Radek Pelánek

Software Engineering. Software Development Process Models. Lecturer: Giuseppe Santucci

Software-Engineering und Optimierungsanwendungen in der Thermodynamik

Development and Evaluation of Point Cloud Compression for the Point Cloud Library

DiPro - A Tool for Probabilistic Counterexample Generation

Process Modelling from Insurance Event Log

PROJECT TIME MANAGEMENT

Statistics Graduate Courses

On Fleet Size Optimization for Multi-Robot Frontier-Based Exploration

A REPORT ON ANALYSIS OF OSPF ROUTING PROTOCOL NORTH CAROLINA STATE UNIVERSITY

Social Media Mining. Data Mining Essentials

Optimal Planning with ACO

USING REINFORCEMENT LEARNING FOR AUTONOMIC RESOURCE ALLOCATION IN CLOUDS: TOWARDS A FULLY AUTOMATED WORKFLOW

CHAPTER 1 INTRODUCTION

Enhancing Wireless Security with Physical Layer Network Cooperation

Formal Verification Problems in a Bigdata World: Towards a Mighty Synergy

Agent Applications in Network Security Monitoring

A new approach for dynamic optimization of water flooding problems

SYSTEMS, CONTROL AND MECHATRONICS

Application of Adaptive Probing for Fault Diagnosis in Computer Networks 1

Course 395: Machine Learning

On Quantitative Software Quality Assurance Methodologies for Cardiac Pacemakers

Bachelorclass

A CP Scheduler for High-Performance Computers

Scaling Bayesian Network Parameter Learning with Expectation Maximization using MapReduce

Data Mining Practical Machine Learning Tools and Techniques

Solving Hybrid Markov Decision Processes

Yu-Han Chang. USC Information Sciences Institute 4676 Admiralty Way (617) Marina del Rey, CA 90292

Safe robot motion planning in dynamic, uncertain environments

Tracking in flussi video 3D. Ing. Samuele Salti

Branch-and-Price Approach to the Vehicle Routing Problem with Time Windows

INTRODUCTION TO MACHINE LEARNING 3RD EDITION

Vision based Vehicle Tracking using a high angle camera

Methodology For Securing Networked Self-Adaptive Embedded Systems

Lecture 1: Introduction to Reinforcement Learning

HYBRID PROBABILITY BASED ENSEMBLES FOR BANKRUPTCY PREDICTION

WORKFLOW ENGINE FOR CLOUDS

Discrete-Event Simulation

Learning outcomes. Knowledge and understanding. Competence and skills

Simulation and Lean Six Sigma

Visual Data Mining. Motivation. Why Visual Data Mining. Integration of visualization and data mining : Chidroop Madhavarapu CSE 591:Visual Analytics

Segmentation of building models from dense 3D point-clouds

List of Publications by Claudio Gentile

D A T A M I N I N G C L A S S I F I C A T I O N

cs171 HW 1 - Solutions

Chapter 14 Managing Operational Risks with Bayesian Networks

UNDERGRADUATE DEGREE PROGRAMME IN COMPUTER SCIENCE ENGINEERING SCHOOL OF COMPUTER SCIENCE ENGINEERING, ALBACETE

Monte-Carlo Methods. Timo Nolle

Market Size Forecasting Using the Monte Carlo Method. Multivariate Solutions

( ) = ( ) = {,,, } β ( ), < 1 ( ) + ( ) = ( ) + ( )

Predicting the Risk of Heart Attacks using Neural Network and Decision Tree

What is Artificial Intelligence?

comp4620/8620: Advanced Topics in AI Foundations of Artificial Intelligence

Genetic algorithms for changing environments

Implementation of hybrid software architecture for Artificial Intelligence System

Tracking and integrated navigation Konrad Schindler

Scheduling Shop Scheduling. Tim Nieberg

Single machine models: Maximum Lateness -12- Approximation ratio for EDD for problem 1 r j,d j < 0 L max. structure of a schedule Q...

CS MidTerm Exam 4/1/2004 Name: KEY. Page Max Score Total 139

Project Cash Flow Forecasting Using Value at Risk

Particle Filter-Based On-Line Estimation of Spot (Cross-)Volatility with Nonlinear Market Microstructure Noise Models. Jan C.

Sensitivity Analysis of Safety Measures for Railway Tunnel Fire Accident

Evaluation of Processor Health within Hierarchical Condition Monitoring System

Introduction to Machine Learning and Data Mining. Prof. Dr. Igor Trajkovski

On evaluating performance of exploration strategies for an autonomous mobile robot

Project and Production Management Prof. Arun Kanda Department of Mechanical Engineering Indian Institute of Technology, Delhi

An ACO Approach to Solve a Variant of TSP

Rover. ats gotorock getrock gotos. same time compatibility. Rock. withrover 8-39 TIME

GOAL-BASED INTELLIGENT AGENTS

Transcription:

A Framework for Simulation-Based Online Planning Martin Wirsing In Kooperation mit Lenz Belzner und Rolf Hennicker, Eingeladener Vortrag auf FACS 2015, Niteroi Modellierung Dynamischer und Adaptiver Systeme WS 2016/17 24. November 2016 1

Autonomous Systems Autonomous systems have to adapt to environmental conditions and new requirements at runtime even if they are defined at design time ASCENS project 2010-2015, EU-funded Integrated Project 15 partners from 7 countries Developed systematic approach for engineering autonomous ensembles including SW process, formal modeling, verification, monitoring, adaptation, awareness Case studies on robotics, cloud computing, e-mobility 2

Decision Making under Uncertainty Very large state spaces S > 10 10 Probabilistic effects Partially uncontrolled environment Incomplete design time knowledge 3

Contents 1. Online planning 2. A generic framework for online planning 3. Simulation-based online planning 1. The framework 2. Monte Carlo Tree Search for discrete domains 3. Cross Entropy for continuous domains: 4. Concluding remarks 4

1. Online Planning 5

Online Planning observe Real Situation execute build State Model and plan 6 Image sources: thegrid.soup.io/post/312159914 mobots.epfl.ch/marxbot.html

Online Planning observe Real Situation execute build State Model and plan Image sources: thegrid.soup.io/post/312159914 mobots.epfl.ch/marxbot.html 7

Online Planning (Informally, Sequential) while true do observe state plan execute action w.r.t. plan end while 8

Online Planning (Informally, Concurrent) while true do observe state execute plan end while 9

Online Planning: Parameters State space S Action space A Operation observe Agent S Attribute actionrequired Agent Bool Operation execute : RealAction () Planning Reward function R S R => getreward Strategy P Action A S) => sampleaction Planning refines initial strategy according to R Online planning Iterated execution and planning 10

Online Planning (Refined) Agent Planner where while true do state observe() planner.state state when actionrequired do Agent actionrequired false action planner.strategy.sampleaction(state) end when action.real.execute() end while 11 Planner while true do plan() end while

Plug Points Agent Planner Agent where while true do state observe() planner.state state when actionrequired do actionrequired false action planner.strategy.sampleaction(state) end when action.real.execute() end while domain specific 12 Planner while true do plan() end while

Plug Points Agent Planner Agent where while true do state observe() planner.state state when actionrequired do actionrequired false action planner.strategy.sampleaction(state) end when action.real.execute() end while domain specific 13 Planner while true do plan() end while design choice

A Framework for Online Planning Real Situation 14

A Framework for Online Planning Real Situation Operates w.r.t. state and strategy P Action (A S) 15

A Framework for Online Planning Real Situation Changes strategy w.r.t. reward function P Action (A S) 16

3. Simulation-Based Online Planning 17

Approach Refine strategy P Action A S by Simulation-Based Planning Provide agent with simulation of itself and domain Generate simulations of future episodes Evaluate simulation episodes wrt. reward function Use estimates to refine simulations Finally: Execute a real action that performed well in simulation Repeat 18

Three Types of State Real Situation State Model Simulation 19

3.1 The Framework for Simulation-Based Planning 20

The Framework for Simulation-Based Planning P Action (A S) 21 P Sim (S S x A)

The Framework for Simulation-Based Planning P Action (A S) changes 22 P Sim (S S x A)

The Framework for Simulation-Based Planning P Action (A S) Simulate wrt. strategy and domain dynamics 23 P Sim (S S x A)

The Framework for Simulation-Based Planning P Action (A S) Simulation result refines strategy Weighted by episode reward 24 P Sim (S S x A)

SBP Parameters Simulation P Sim S S x A) Agent s model/knowledge of domain dynamics Can be changed at runtime May differ from real domain dynamics Can be learned/refined from observations Maximum search depth h max Impacts simulation effort Less simulation steps: Fast but shallow planning Can be dynamically adapted 25

Simulation-Based Planning Algorithm op plan() vars s, r, episode, a s state r rewardfct.getreward(s) episode nil for 0.. h max do a strategy.sampleaction(s) s simulation.samplesuccessor(s, a) episode episode::(s, a) r r + rewardfct.getreward(s) end for strategy updatestrategy(episode, r) end op 26

Simulation-Based Planning: Plug Points op plan() vars s, r, episode, a s state r rewardfct.getreward(s) episode nil for 0.. h max do a strategy.sampleaction(s) s simulation.samplesuccessor(s, a) episode episode::(s, a) r r + rewardfct.getreward(s) end for strategy updatestrategy(episode, r) end op 27

Simulation-Based Planning: Variants Variants define updatestrategy(episode, Real) Vanilla Monte Carlo Genetic Algorithms Monte Carlo Tree Search for discrete domains Cross Entropy Planning for continuous domains 28

3.2 Monte Carlo Tree Search for Discrete Domains Strategy as tree Nodes represent states and action choices Add a node per simulation Aggregate simulation data in nodes Reward and frequency Sample actions w.r.t. aggregated data Cameron B Browne, Edward Powley, Daniel Whitehouse, Simon M Lucas, Peter I Cowling, Philipp Rohlfshagen, Stephen Tavener, Diego Perez, Spyridon Samothrakis, and Simon Colton. A survey of monte carlo tree search methods. Computational Intelligence and AI in Games, IEEE Transactions on, 4(1):1-43, 2012. 29

Strategy Inside the Tree Cumulated reward Nr. of episodes E.g. Upper Confidence Bounds for Trees Treat action selection as multiarmed bandit Select actions that maximize 11 / 21 UCT j = X j + 2C 2 ln n n j 7 / 10 4 / 8 0 / 3 2 / 4 5 / 6 1 / 2 1 / 3 2 / 3 2 / 3 3 / 3 Kocsis, Levente, and Csaba Szepesvári. Bandit based monte-carlo planning. Machine Learning: ECML 2006. Springer Berlin Heidelberg, 2006. 282-293. 30

Strategy Inside the Tree E.g. Upper Confidence Bounds for Trees Treat action selection as multiarmed bandit Select actions that maximize 11 / 21 UCT j = X j + 2C 2 ln n n j 7 / 10 4 / 8 0 / 3 2 / 4 5 / 6 1 / 2 1 / 3 2 / 3 Exploit observations Explore solution space 2 / 3 3 / 3 31 X j : Average reward of child node j n: Nr. of episodes from current node n j : Nr. of episodes from child node j C: UCT exploration constant Kocsis, Levente, and Csaba Szepesvári. Bandit based monte-carlo planning. Machine Learning: ECML 2006. Springer Berlin Heidelberg, 2006. 282-293.

Expand the Tree Add a new node When an episode leaves the tree 11 / 21 7 / 10 4 / 8 0 / 3 2 / 4 5 / 6 1 / 2 1 / 3 2 / 3 2 / 3 3 / 3 0 / 0 Kocsis, Levente, and Csaba Szepesvári. Bandit based monte-carlo planning. Machine Learning: ECML 2006. Springer Berlin Heidelberg, 2006. 282-293. 32

Strategy Outside the Tree Simulate episode to depth h max Observe result E.g. reward observed Here: 0 or 1 7 / 10 11 / 21 4 / 8 0 / 3 2 / 4 5 / 6 1 / 2 1 / 3 2 / 3 2 / 3 3 / 3 Initial P Action (A S) 0 / 0 Reward: 1 33

Update Strategy Update the statistics This changes the strategy inside the tree 12 / 22 8 / 11 4 / 8 0 / 3 2 / 4 6 / 7 1 / 2 1 / 3 2 / 3 2 / 3 4 / 4 1 / 1 34

Trees Represent Strategies MCTS builds a skewed tree Tree can be interpreted as P Action (A S) Promising parts of the strategy space are prefered 35

Example Domain Search and Rescue Victims, fires and ambulances Unknown topology Unknown initial situation Agent actions Noop, Move Load or drop a victim Extinguish fire if adjacent Noise Actions may fail Fires ignite and cease Experiment Monte Carlo Tree Search Large state space (> 10 12 ) Large branching factor (2 18 ) 0.2 seconds/decision P Sim S S x A) models domain perfectly 36

Experimental Results (I) Measured (in %) Victims at ambulance (blue) Victims in a fire (red) Positions on fire (green) Provided reward Victim at ambulance: +100 System synthesized sensible behavior Results in 0.95 confidence interval Checked with MultiVeStA Stefano Sebastio and Andrea Vandin. MultiVeStA: statistical model checking for discrete event simulators. ValueTools '13. 2013. 310-315. Autonomy 37

Experimental Results (II) Measured (in %) Victims at ambulance (blue) Victims in a fire (red) Positions on fire (green) Expose system to unexpected events At steps 20, 40, 60, 80 All carried victims are dropped New fires break out Events NOT simulated by planner New situation incorporated by planner System showed sensible reactions Results in 0.95 confidence interval Robustness 38

Experimental Results (III) Measured (in %) Victims at ambulance (blue) Victims in a fire (red) Positions on fire (green) Change system goals while operating Change of reward function Steps 0-40: Reward for victims not in a fire Steps 40-80: Reward for victims at ambulance Change NOT simulated by planner But planner incorporates new situation System adapted behavior wrt. goals Results in 0.95 confidence interval Flexibility 39

From Discrete to Continuous Domains Actions State and action space = R n E.g. (speed, rotation, duration) for actions Cross Entropy Planning Approximate (unknown) target distribution Multivariate Gaussian distribution Sample state space (locally) and choose elite samples for updating the strategy ( sharpen the Gaussian) Here: Gaussians over sequences of actions Sequence length = planning depth 40 Ari Weinstein and Michael L. Littman. Open-loop planning in large-scale stochastic domains. Proceedings of the Twenty-Seventh AAAI Conference on Artificial Intelligence, 2013.

Cross Entropy Planning White circle represents agent Red boxes represent moving victims Black lines are simulation episodes Action parameters are speed, rotation and duration Images show iterations 1, 5 and 10 Simulation depth is adaptive here (reduced simulation cost) Note the iterative shaping of a promising strategy 42

Video: Cross Entropy Planning 43

Time [s] Cross Entropy Planning Experiments CE: Cross Entropy Planning TACE: Time Adaptive CE C3: Continuous CE Control h: Planning depth d: Action duration Interleaving Parallel Victims left 46

Concluding Remarks Motivation Complex dynamic domains High degrees of non-determinism Approach Model a space of solutions, instead of a single one Online planning: Refine the solution space at runtime wrt. observations and knowledge to determine a currently viable action This Talk 47 Component framework for Online Planning Parallelization of execution and planning Instantiation: Simulation Based Planning Outlook Two examples: MCTS, Cross Entropy Planning Model learning of domain dynamics Soft temporal logic for formal (statistical) verification Learning and planning for ensembles

References 1. Cameron B Browne, Edward Powley, Daniel Whitehouse, Simon M Lucas, Peter I Cowling, Philipp Rohlfshagen, Stephen Tavener, Diego Perez, Spyridon Samothrakis, and Simon Colton. A survey of Monte Carlo tree search methods. IEEE Transactions on Computational Intelligence and AI in Games, 4(1), 2012, 1-43. 2. Kocsis, Levente, and Csaba Szepesvári. Bandit based Monte-Carlo planning. In Machine Learning: ECML'06. Lecture Notes in Computer Science 4212, 2006, 282-293. 3. Bubeck, Sébastien, and Rémi Munos. Open Loop Optimistic Planning. In: 23rd Conference on Learning Theory, COLT 2010. Omnipress 2010, 477-489. 4. Ari Weinstein and Michael L. Littman. Open-loop planning in large-scale stochastic domains. In: Proceedings of the Twenty-Seventh AAAI Conference on Artificial Intelligence, 2013. 5. Stefano Sebastio and Andrea Vandin. MultiVeStA: statistical model checking for discrete event simulators. In Proceedings of the 7th International Conference on Performance Evaluation Methodologies and Tools (ValueTools '13). 2013, 310-315. 6. Lenz Belzner, Rolf Hennicker, Martin Wirsing: OnPlan: A Framework for Simulation-Based Online Planning. In Christiano Braga, Peter Csaba Ölveczky (eds.): Formal Aspects of Component Software - 12th International Conference, FACS 2015, Revised Selected Papers. Lecture Notes in Computer Science 9539, 2016, 1-30. 48