UNSUPERVISED MACHINE LEARNING EMBEDDED IN AUTONOMOUS INTELLIGENT SYSTEMS Ramón García Martínez (1,2,3) & Daniel Borrajo (4) 1.-Arficial Intelligence Department. School of Computer Science. Politechnic University of Madrid. 2.- Intelligent Systems Laboratory. Computer Science Department. School of Engineering. University of Buenos Aires 3.- Arficial Intelligence Laboratory. Buenos Aires Instute of Technology 4.- Arficial Intelligence Department. Carlos III University. rgm@itba.edu.ar Keywords: Unsupervised machine learning, embedded machine learning, autonomous intelligent systems, theory formaon, theory revision. Abstract: An overview of the architecture of the system is given, the system sensor sub-system and the process of building local theories is defined, the planner that the system uses to build plans that permit to it to reach the self-proposed objecves is introduced and the ponderator that permits to obtain a measure from how good or how bad is the generated plan is given, finally some results of experimentaon are described. 1. Introducon: Machine Learning [Carbonell et al., 1983; 1986; Kodratoff & Carbonell, 1990] is a field of Arficial Intelligence which has taken impulse in the last years. Among the diferent types of Machine Learning techniques, those based on observaon and discovery are the best modelizers for the human behavior. From this point of view, it is interesng to study the ways in which an autonomous system [Fritz, 1984; Fritz et al., 1989; García Martínez, 1990] can build automacally theories which modelice its enviroment. This theories can be used by the system to improve its behavior in response to changes in its enviroment. Theory formaon by heurisc mutaon of observarons [García Martínez, 1992; 1993a; 1993c] have been proposed as a type of Machine Learning technique based on observaon and acve experimentaon for autonomous intelligent systems [García Martínez, 1993b; 1993d]. In this paper is proposed the integrated architecture of a system that with elemental components shows an autonomous intelligent behavior. The presented architecture is guided to try of solving the machine learning problem of the model of the environment (this is: what are the consequences of applying determined acons to the environment) deparng of no previous knowledge. To accelerate the convergence of the learned model is proposed the ulizaon of heuriscs and to handle the contradicons among the generated theories is proposed the ulizaon of esmators of probability distribuon. Consideraons are accomplished on the problem of building theories; the limits of the problem are defined; a method of building theories based on the applicaon of heurisc of generated theories mutaon is proposed, the definion of each heurisc is given, the limits of the problem of theories weighng are defined, a weighng method is proposed.
An overview of the accomplished experimental work is presented : the experimental design, the variables, the results through graphics are shown and an interpretaon for them is provided. In the conclusions are indicated the original contribuons of the paper, are described some found limitaons and are outlined future research work. 2. General System Descripon 2.1 Environment It was selected the robot model used by Korsten, Kopeczak and Szapakowizc [1989], the one which describes the autonomous agents behavior as compared to various stages. For the descripon of the stages model we base on the model suggested by [Lozano -Perez and Wesley, 1979; Iyengar and other 1985; Gil de Lamadrid and Gini, 1987; Mckendrick, 1988; Dudek and other 1991; Borenstein and Korent, 1991; Evans and other, 1992], those which establish to study the learning processes, planning and simulaon two-dimensional stages. The descripon of the environment can be simulated in a counterfoil in the one which each element represents a poron of the space, the one which can be an obstacle, an energy point or a point of the passable space by the robot. 2.2 Architecture The system can be described as an exploring robot that perceives the environment. As of the situaon in the one which is found the system attempts to determine an acons sequence that permit to it to reach a nearby objecve, the one which calls plan. Said sequence is presented to the evaluator of plans, who determines its acceptability. The plans controller in execuon is entrusted with verifying that the plan will be fulfilled successfully. All movement of the robot is accompanied of the descripon of its environment, the conjuncon of the acon applied to such descripon and obtained the resulng situaon make to the apprenceship of the system. If said knowledge already it was learnt is reinforced what is, in other case is incorporated what is and are generated mutant theories. 2.3 Sensoring System The model of the sensoring system was extracted and modified from the proposed by [Mahadevan and Connell, 1992], who were suggesng a system of 24 sectors, distributed in three levels, Garcia Marnez [1992c] suggested that the model had to present 8 sectors constuted in two levels and distributed in three regions: a Lateral Left, a Frontaland and a Lateral Straight. The Frontal region is found shared in vercal form in two subregions. As previously we menon each region possesses two scope levels, a level of sensoring nearby and a level of sensoring distant. The sensoring system possesses eight sectors, each sector is corresponded with a binary representaon the one which is reúne in a set that describes us the percepon of a situaon.
3. Theories formaon The structure model for the theories training was developed by García Martínez [1991a, 1991b, 1992a, 1992b, 1992c]. The same is an extension of the model of Fritz [Fritz and other, 1989], in the one which an experience unit was constuted by: [Inial Situaon, Acon, Final Situaon]. García Martínez proposes the to aggregate coefficients that permit to determine the acceptability of a theory. The proposed experience model possesses the following structure: Ti = SI, A, SF, P, K, U Theory: SI A SF P K U Ti Inial Situaon Acon Final Situaon Quanty of mes that the theory Ti conclude successfully (the waited final situaon is obtained). Quanty of mes that the theory Ti was used Usefulness level reached applying the acon the inial situaon of the theory. We have assumed to calculate the usefulness of the theories, the one which reflects an esmate measure of the distance of the robot to the nearest energy point. Given the Inial Situaon perceived by the autonomous intelligent system, in an instant of the me (let it be T1) applying an acon, we arrive at the Final Situaon (in a space of the me T2). Taking into account the presented theory model can say that given a theory Ti, the supposed condions of the theory Ti, the acon, the predicted effects of the theory, the quanty of mes that the theory is applied with success P, the number of mes that the theory is applied K, being S the situaon to the one is applied and S' the situaon to the one is arrived, the weighng method is detailed what is with the following algorithm: 4. Planning The planner is mechamism through which the system build a plan (acons succession) that allows to it to reach its objecves in an efficient form. Each me the sistem is found in a parcular situaon, tries of arming a plan that applied to the current situaon permits to the system to reach a desirable situaon. If this plan exists the desirable situaon is converted into an objecve situaon. Eventually could occur that if it did not exist a plan, for something which one must to select other situaon as desirable, if no one of the desirable situaons can be reached by a plan, then is executed some of the plans by conngency. As of all the known situaons the planning weapon the desirable situaons stack. A situaon is known if it is registered as a situaon of a theory of the set of theories. The desirable situaons stack constuted by the predicted effects of the theories that possess greater usefulness level, ordered decreasingly usefulness level. Each situaon can be interpreted as the resulng of have applied an acon to an ancipated situaon. This determines the precedence among the
different situaons, as of those which will be armed the graph of situaons. Once it armed the graph is proceeded to find the plan among the desirable situaon and the current situaon. If the plan is not found, there is taken other situaon of the desirable situaons stack, and so on unl there do not exist more desirable situaons or is found a plan. If the stack is emped and no plan was obtained, there are generated plans by conngency. If at least a plan was found, is weighted to obtain its acceptability. 5. Ponderaon of Plans The knowledge that the system possesses as the set of theories at a given me can be viewed as a model of how its enviroment will react to to the system s acons. The quoent P/K of a given theory, can be asumed as an esmator of the probability [Calistri-Yeh, 1990] of the fact that given the the acon Ak applied to a situaon Si gives the resulng situaon Sj. The situaon Sj verifies the predicted effects of the considerate theory.therefore the knowledge that the system has for a given instant can be thought as the transion matrix Mk of the acon Ak, that has on the posion (i, j) the quoent P/K of the theory that possesses the supposed condions Si, the predicted effects Sj and the acon Ak. 6. Experiments The autonomous system that was simulated answered successfully to the experiments in which it was ran. In the environment the system demonstrated capacity to understand and express intelligent behaviors while records experience (theories) of the enviroment in which it was submerged. For the presentaon of the results obtained in the experiment this paper have been guided by the structure proposed by Matheus [1990, 1990b]. The results of experimentaon can be shown in the following graphic: Axis Y: Average of Succesful Plans Axis X: Time %P.E. BASE: Average of Succesful Plans without mutaon and ponderaon %P.E. C/A: Average of Succesful Plans with mutaon %P.E. C/P: Average of Succesful Plans with ponderaon %P.E. C/(A+P): Average of Succesful Plans withmutaon and ponderaon 90 80 70 60 50 40 30 20 10 0 0 2000 4000 6000 8000 % P.E. Base % P.E. C/A % P.E. C/P % P.E. C/(A+P) This results empiracally show that:
1. Theory mutaon by heuriscs improve the system behavior. 2. The incorporaon of a weighng mechanism for planning allows to especify the plans acceptability, increasing the number of concluded plans successfully. 7. Conclusions An overview of the architecture of a system with an embedded machine learning mechanism based on theory mutaon and theory ponderaon has been given, esmators that permits to obtain a measure of how good or how bad is the generated plan has been proposed and the improvement of the behavior of the system with the embedded unsupervised learning mechanism has been shown experimentaly. 8. References Carbonell, J., Michalski, R. y Mitchell T. 1983. Machine Learning: The Arficial Intelligence Aproach Vol. I. Morgan Kaufmann. Carbonell, J., Michalski, R. y Mitchell T. 1986. Machine Learning: The Arficial Intelligence Aproach Vol. II. Morgan Kaufmann. Fritz, W. The Intelligent System. 1984. ACM SIGART Newsletter. Nber 90. October. Fritz, W., García Martínez, R., Blanqué, J., Rama, A., Adobba, R. y Sarno, M. 1989. The Autonomous Intelligent System. Robocs and Autonomous Systems. Vol. 5 Nber. 2. pp. 109-125. Elsevier. García Martínez, R. 1990. Un Algoritmo de Generación de Estrategias para Sistemas Inteligentes Autonomos. Proceedings II Iberoamerican Congress on Arficial Intelligence. pp. 669-674. LIMUSA. México. García Martínez, R. 1992. Aprendizaje Basado en Formación de Teorías sobre el Efecto de las Acciones en el Entorno. Master Thesis. Arficial Intelligence Department. School o Computer Science. Politechnic University of Madrid. García Martínez. 1993a. Aprendizaje Automáco basado en Método Heurísco de Formación y Ponderación de Teorías. Tecnología. Vol.15. pp. 159-182. Brasil. García Martínez, R. 1993b. Heurisc theory formaon as a machine learning method Proceedings VI Internaonal Symposium on Arficial Inteligence. pp 294-298. LIMUSA. México. García Martínez, R. 1993c. Heurisc-based theory formaon and a soluon to the theory reinforcement problem in autonomous intelligent systems. Proceedings III Argenne Symposium on Arficial Intelligence. pp. 101-108. Science and Technology Secretary Press. Argenne. García Martínez, R. 1993d. Measures for theory formaon in autonomous intelligent systems. Proceedings RPIC'93. pp 451-455. Tucumán University Press. Argenne. Kodratoff I. y Carbonell J. 1990. Machine Learning: The Arficial Intelligence Aproach. Vol. III. Morgan Kauffmann. Matheus, C. Feature Construcon : An Analyc Framework and An Applicaon to decision Trees. Ph.D. Tesis. Graduate College de la University of Illinois en Urbana Champaign. 1990.