Using abstract models of behaviors to automatically generate reinforcement learning hierarchies

Size: px

Start display at page:

Download "Using abstract models of behaviors to automatically generate reinforcement learning hierarchies"

Norah Clarke
7 years ago
Views:

1 Hauptseminar Intelligente Autonome Systeme Using abstract models of behaviors to automatically generate reinforcement learning hierarchies Christian Sosnowski Betreuer: Freek Stulp Technische Universität München Fakultät für Informatik Forschungs- und Lehreinheit Informatik IX

2 Content Introduction Fundamentals Building Task Hierarchies through planning The P-HSMQ Algorithm Termination Improvement Conclusion Technische Universität München Christian Sosnowski Hauptseminar 2

3 Introduction Intelligent autonomous System Learning in general no detailed programming to solve a problem Reinforcement learning Solution can be unknown Proven method for artificial intelligence But: Curse of dimensionality increased complexity will raise the number of states exponentially example Technische Universität München Christian Sosnowski Hauptseminar 3

4 The task: Introduction The example The robot knows its position Can move to one of the eight neighbor cells Can pick up and carry an object Pick up the coffee and the book and carry it to the lounge Technische Universität München Christian Sosnowski Hauptseminar 4

5 Solution Introduction Structure of the problem Use of background knowledge Avoid useless explorations Provide general guidance Let the system learn inside a limited scope That s what this presentation is about Technische Universität München Christian Sosnowski Hauptseminar 5

6 Introduction Fundamentals Building Task Hierarchies through planning The P-HSMQ Algorithm Termination Improvement Conclusion Technische Universität München Christian Sosnowski Hauptseminar 6

7 Fundamentals The Markov Decision Process (MDP) Restriction on the necessary information No hidden states No dependence on history All involved probabilities are independent and beside of the actual state and action Technische Universität München Christian Sosnowski Hauptseminar 7

8 Fundamentals Q-learning Incremental learning algorithm (step by step) Using a table to store the Q-values for each state and action pair Technische Universität München Christian Sosnowski Hauptseminar 8

9 Fundamentals Q-learning small example Rectangle of 3 x 4 positions Robot is at A4 and has to reach C1 = 0.1 = 0.9 Technische Universität München Christian Sosnowski Hauptseminar 9

10 Fundamentals Q-learning small example Choosing of random paths A4-A3-A2-B2-C2-C1 only the value at pos C2 will be updated: C2 = 0.9*0+0.1*(0+0.9*1) = 0.09 Technische Universität München Christian Sosnowski Hauptseminar 10

11 Fundamentals Q-learning small example A4-B4-C4-B3-A3-B2-C1 only the value at pos B2 will be updated: B2 = 0.9*0+0.1*(0+0.9*1) = 0.09 Technische Universität München Christian Sosnowski Hauptseminar 11

12 Fundamentals Q-learning small example A4-A3-B3-B2-C1 The following values will be updated: B3 = 0.9*0+0.1*(0+0.9*0.09) = B2 = 0.9* *(0+0.9*1) = Technische Universität München Christian Sosnowski Hauptseminar 12

13 Fundamentals Q-learning small example Finally the algorithm converges to the following numbers: Technische Universität München Christian Sosnowski Hauptseminar 13

14 Fundamentals The curse of dimensionality In primitive examples Q-learning converges nicely In practice very poor performance Real world problems with multi-dimensional state problems number of state action pairs will raise exponential Technische Universität München Christian Sosnowski Hauptseminar 14

15 Fundamentals The curse of dimensionality real world problem Flying an airplane Every gage has numerous discrete reading (i.e. the dive angle 0-90 ) With every gage/reading the number of states rise exponential 90 (dive angle) (0-600 speed) 3*10 8 ( ft altitude) Video Aircraft View Technische Universität München Christian Sosnowski Hauptseminar 15

16 Introduction Fundamentals Building Task Hierarchies through planning The P-HSMQ Algorithm Termination Improvement Conclusion Technische Universität München Christian Sosnowski Hauptseminar 16

17 Building Task Hierarchies through planning There are no general purpose solutions Divide problems into smaller subtasks and try to solve them one after the other Malcolm R. K. Ryan defines behaviors which describe the subtask and puts them together to a plan Technische Universität München Christian Sosnowski Hauptseminar 17

18 Building Task Hierarchies through planning Subtask in the example Go (Room1, Room2) Get (Object, Room) Technische Universität München Christian Sosnowski Hauptseminar 18

19 State Building Task Hierarchies through planning Building the plan formal language Goal Teleo-operators define a goal-directed behavior with pre- and postcondition Technische Universität München Christian Sosnowski Hauptseminar 19

20 Building Task Hierarchies through planning Plan is represented in a tree More than one node can be active Technische Universität München Christian Sosnowski Hauptseminar 20

21 Building Task Hierarchies through planning Combining Planning and Learning Combining the plan with reinforcement learning Local reward function Executing an action a in state s resulting in transition to state s Technische Universität München Christian Sosnowski Hauptseminar 21

22 Building Task Hierarchies through planning Combining Planning and Learning Overall aim: Technische Universität München Christian Sosnowski Hauptseminar 22

23 Introduction Fundamentals Building Task Hierarchies through planning The P-HSMQ Algorithm Termination Improvement Conclusion Technische Universität München Christian Sosnowski Hauptseminar 23

24 The P-HSMQ Algorithm Technische Universität München Christian Sosnowski Hauptseminar 24

25 The P-HSMQ Algorithm Technische Universität München Christian Sosnowski Hauptseminar 25

26 The P-HSMQ Algorithm Technische Universität München Christian Sosnowski Hauptseminar 26

27 The P-HSMQ Algorithm Technische Universität München Christian Sosnowski Hauptseminar 27

28 The P-HSMQ Algorithm Experiment 1 P-HSMQ HSMQ all behaviors Plan w/o HRL = 0.1 = 0.95 Trial length below 500 HSMQ all behaviors: P-HSMQ: Technische Universität München Christian Sosnowski Hauptseminar 28

29 Introduction Fundamentals Building Task Hierarchies through planning The P-HSMQ Algorithm Termination Improvement Conclusion Technische Universität München Christian Sosnowski Hauptseminar 29

30 Termination Improvement P-HSMQ always finishes a behavior Ignoring effect which might cause the actions to be no longer appropriate The example with the bump Technische Universität München Christian Sosnowski Hauptseminar 30

31 Termination Improvement Technische Universität München Christian Sosnowski Hauptseminar 31

32 Termination Improvement Experiment 2 Comparing P-HSMQ with TRQ = 0.1 = 0.95 = 0.1 (to spill the coffee) Trial length below 500 P-HSMQ: TRQ: Final learnt policy: Technische Universität München Christian Sosnowski Hauptseminar 32

33 Introduction Fundamentals Building Task Hierarchies through planning The P-HSMQ Algorithm Termination Improvement Conclusion Technische Universität München Christian Sosnowski Hauptseminar 33

34 Conclusion Q-learning only basic mean to solve a problem Combining abstract models (plan) with reinforcement learning improves performance significantly High level development of plans by humans vs. reinforcement learning for low level optimization Back to the example of the auto pilot Still to solve: Analyze what went wrong if a plan failed Invent new behaviors on their own to fit the circumstances that arise Technische Universität München Christian Sosnowski Hauptseminar 34

35 Any questions? Technische Universität München Christian Sosnowski Hauptseminar 35

Feature Selection with Monte-Carlo Tree Search

Feature Selection with Monte-Carlo Tree Search Robert Pinsler 20.01.2015 20.01.2015 Fachbereich Informatik DKE: Seminar zu maschinellem Lernen Robert Pinsler 1 Agenda 1 Feature Selection 2 Feature Selection