Intelligent Mobile Vehicle Navigator Based on Fuzzy Logic and Reinforcement Learning Aashay Harlalka [100050035] Ankush Das [100050042] Pulkit Maheshwari [100050043]
Introduction The navigation of a mobile vehicle can be considered as a task of determining a collision free path that enables the vehicle to travel through an obstacle course from an initial configuration to a goal configuration. (Path Planning Problem) Path Planning Problem could be classified into: global path planning and local path planning. Global Path Planning : an exact environment model is used for planning the path. Local Path Planning : uses obstacle avoidance methods. Potential Field Method ( proposed by Khatib )
Motivation Drawbacks of Global Planning Method They can be conducted only in a completely known environment. Their time complexity grows with the geometry complexity and grows exponentially with the number of degrees of freedom in the vehicle s motion Drawbacks of Local Planning Method (Potential Field Method) Local minimum could occur and cause the vehicle to be stuck. It tends to cause unstable motion in the presence of obstacles. It is difficult to find the force coefficients influencing the vehicle s velocity and direction in an unknown environment.
Motivation Fuzzy Logic Approach Efficient in tackling the problem of obstacle avoidance, without requiring to construct an analytical model of the environment. Each rule of the rule base has a physical meaning making it possible to tune the rules by using expert s knowledge. Drawbacks of Fuzzy logic Difficult to consistently construct the rules. Tuning of the constructed rules is time consuming.
Motivation EEM ( Environment Exploration Method ) Training Method Uses reinforcement learning to associate an appropriate action to a situation. Drawbacks of EEM Slow and uncertain convergence of the learning process Insufficiently learnt rule-base
Overview of the Navigator Vehicle Model and Sensor arrangement Uses a cylindrical mobile platform driven by three active wheels. Equipped with an ultrasonic sensor ring having N sensors evenly distributed along the ring.
Overview of the Navigator Coordinate Systems and Navigation Task Each navigation task is specified in the world coordinate, where the vehicle configuration is represented by S = (X 0 Y 0 θ), where (X 0, Y 0 ) are co-ordinates of vehicle s center and θ is heading angle of vehicle. A navigation task is to obtain the environment information, d i and P g (X g,y g ), and the vehicle s configuration S(t) at each time step t, and determine the output variables v(t) and Δθ(t).
Overview of the Navigator Architecture of the Navigator Consists of 4 main modules : Obstacle Avoider (OA), Goal Seeker (GS), Navigation Supervisor (NS) and Environment Evaluator (EE). OA determines v a and Δθ a GS determines v g and Δθ g NS fuses these two values to obtain the eventual v and Δθ. EE computes the distance sensed by the ultrasonic sensors, and determines the value of W, which is used for fuzzification by the OA.
Overview of the Navigator
Obstacle Avoider Fuzzy Control of Obstacle Avoidance Input variables of the OA are the sensor input variables, d i The outputs are v a and θ a. Membership Functions (as in figure)
Obstacle Avoider Fuzzification of input variables The value of each d i is fuzzified and expressed by the fuzzy sets VN, NR, FR Rule base construction through reinforcement learning Fuzzy Inference Fire strength of the j th rule, μ j μ j = μ D j1 d 1 μ D j2 d 2 μ D jn d n
Obstacle Avoider Defuzzification of the output variables Method of height defuzzification Low computing cost v a = μ jb 1j μ j, θ a = μ jb 2j μ j
Obstacle Avoider Rule Learning for Obstacle Avoidance The vehicle begins the learning with an initial v and θ at time step t = 0. It moves into a new position at time step t = 1, and so on, until a collision occurs at the time step t = k. The whole process, until a collision occurred, is called a trial and the time step t; (t > 0) is called the t th learning step. A failure signal is fed back to the learning network, and the rules which were used at the previous time steps k; k - 1; k - 2, are changed in order to get an improvement on the vehicle s performance.
Obstacle Avoider After the rules are updated, a new trial begins at the (k+1) th learning step. The process is iterated and terminated until no more collisions occur.
Obstacle Avoider Simulation of Rule Learning for Obstacle Avoidance For efficient learning, a tradeoff between exploration and exploitation should be achieved to maximize the effect of learning and minimize the costs of exploration. Environment Exploration Method ( EEM ) Straightforward and simple method. Explores and converges slowly in a complex environment New Training Method Phase 1: The vehicle begins its training from an arbitrarily chosen start configuration, moves in a specific direction ( learning is iterated as in the EEM ). Phase 2: The vehicle then learns to navigate in the opposite direction with a new start configuration. Upon collision, it backtracks some steps and changes direction accordingly. Both these phases are completed when vehicles maintains trajectory without collision.
Performance Analysis Embedded in a fully integrated and interactive simulator developed on the SGI IRIX operating system and the OpenInventor platform. Three cases are compared, where t 1 and t 2 are the trajectories determined by the vehicle while using the rule base constructed by the new training method with and without the EE, respectively; and t 3 is the trajectory determined by the vehicle while using the rule base constructed by the EEM (terminated at 100,000 learning steps with W = 60 cm).
Motions of t 1,t 2,t 3 Top view of a laboratory and trajectory from s1 to g1
Analysis of t 1 s motion
Analysis of t 1 s motion s 1 - Vehicle was at a velocity of about 14 cm/s. a - Turned its heading direction slightly towards the goal g 1 with a small drop in velocity. b - Accelerated and passed by the door on its right. c - Encountered another AMV, vehicle slowed down to below 12 cm/s, before making a relatively large steering change to avoid the AMV. d - Accelerated to top speed when passing the table (TB). e - Detected the presence of the human being, decelerated, and steered to the left. f - Accelerated when passing the bookshelf (BS).
Analysis of t 1 s motion g - Decelerated when approaching the two human beings. h - Slowed down to about 9 cm/s when it was directly in front of HB3, before making a turn to the right. Selected the path between HB2 and HB3 and navigated through. g 1 - Decelerated gradually until coming to a stop.
Observations - t 1 s motion Acceleration/deceleration ranges are small when the vehicle passes by an obstacle but large when the obstacles are in its path. No abrupt change of velocity (±3 cm/s). No abrupt change in the steering angle (±11.5º). t 2 - Velocity and steering angle functions are very similar to t 1 even though the EE was not used. t 3 - Abrupt changes in velocity and steering angle, velocity varied between ±6 cm/s and the steering angle varied between +40º and -29º.
Evaluation of Path Quality Six navigation tasks were conducted and the errors are tabulated. p a - Length of actual path. p e - Length of shortest path. d ae - Deviation of the vehicle s position from the shortest path. E r - Relative error = (p a - p e )/p e
Evaluation of Path Quality NAVIGATION UNDER THE RULE BASE CONSTRUCTED BY THE EEM
Evaluation of Path Quality T Pa/cm Pe/cm Er(%) Average dae(cm) Max dae Time(s) obstacles Colloision 1 833.7 785.6 6.1 14.8 40.1 73.8 7 0 2 795.0 759.8 4.6 9.8 33.8 70.2 6 0 3 682.5 660.1 3.4 7.6 28.6 62.1 5 0 4 584.3 573.0 2.0 4.3 13.2 54.6 4 0 5 493.7 485.0 1.8 3.8 10.0 49.5 2 0 6 424.1 422.9 1.2 3.1 8.5 45.6 0 0 NAVIGATION UNDER THE RULE BASE CONSTRUCTED BY THE NEW METHOD
Conclusion Fuzzy navigator performs well in complex and unknown environments, using a rule base that is learned from a simple corridor-like environment. Fusion of the obstacle avoidance and goal seeking behaviors Aided by an environment evaluator to tune the universe of discourse of the input sensor readings and enhance its adaptability. 5 distinct advantages: 270 times faster in learning speed Only 4% of the learning cost Very reliable convergence of learning 98.8% of learned rules High adaptability
References N. H. C. Yung, Cang Ye: An intelligent mobile vehicle navigator based on fuzzy logic and reinforcement learning. IEEE Transactions on Systems, Man, and Cybernetics, Part B 29(2): 314-321 (1999)
Thank You.
,Trace of j th rule, λ is trace decay rate T is the time between two steps γ is the discount-rate parameter β is a positive constant
α is the learning rate, e mj (t) is the eligibility trace of the j th rule δ is the decay rate of eligibility
EEM Iteration Current distance readings are fed into the fuzzy quantization module, where they are encoded into μ j (t). If max ( ω mj (t) ) = 0, then the initial control actions, v a and θ a, are used as the control outputs; otherwise the control outputs are determined by defuzzification the external reinforcement signal, current prediction value p m (t), and the internal reinforcement signal is calculated by the previous equations the weights of the ACE and ASE (v mj (t) and ω mj (t)) are updated by the equations, while the trace of the rule and the eligibility trace are also updated Finally, if there is no collision, the configuration of the vehicle is changed by last equation and the learning process returns to Step 1. If a collision occurs, i.e., r m (t) = -1; v mj (t); trace(t); pm(t - 1) and e mj (t) are reset to zero. The vehicle is backtracked 4 steps and its heading direction is reversed. The weights of the ASE, ω mj (t) which are learned just before the collision are then used for the next trial. The next trial begins by repeating Step 1 through Step 5 again.