1 Game Analysis and Control by Means of Continuously Learning Networks Jürgen Perl University of Mainz Institute for Computer Science Abstract The paper deals with the question, if and how the process of learning can be modelled, analysed and maybe improved by means of Neural Networks. The problem is that most of the developed types of network are supposed for restricted technical purposes only. The similarity to neural brain structure is reduced to some basic formulas describing algorithms of static learning in the meaning of pattern recognition and activity selection. Using a more dynamic approach, however, it seems to be possible to model dynamic aspects of learning and decision processes as well. One step in this direction has been done with the Dynamically Controlled Network DyCoN, which in order to support continuous learning has been developed on the basis of a conventional Kohonen Feature Map KFM and successfully been tested in different areas of sport during the last couple of years. The approach presented here on the one hand shall demonstrate the performance that is induced by the ability of continuous learning. On the other hand, it is sketched in an outlook what could be future developments of this approach and what kind of basic work has to be done in order to better understand what is happening in a game. 1 Introduction The basic idea of the approach presented briefly spoken is as follows: On the one hand, learning is an adaptive physiological process in order to reduce a lack of information or knowledge. On the other hand, pattern learning is an important ability in order to understand and control behavioural processes as are known in particular from sport. So, models of physiological adaptation and models of pattern learning could be combined to physiologic adaptation models of pattern learning. Such a model then could help to understand and improve e.g. processes of motor learning as well as tactical processes in games. To this aim, in the following chapter 2 one model of physiological adaptation (PerPot) and one of dynamic pattern learning (DyCoN) are introduced, which have been developed by the author during the last couple of years, and which are successful in several applications. (See Perl (2001a), (2001d), (2002). ) Chapter 3 presents some of these examples from different areas of application. Chapter 4 deals with some current approaches and projects, the most spectacular (and a little bit speculative) one is to control the behaviour of simulated football players (so called "multiagents"), which act on the basis of there own "understanding" of the world they spot. The long term aim is to develop such agents and/or robots, which are able of "learning by doing" and so can be helpful as assistants not only in technical areas but in a lot of applications reaching from sport over health and care to medicine.
2 2 Pattern learning as an adaptive process 2.1 PerPot: Antagonistic metamodell of physiological adaptation The metamodel PerPot (i.e. Performance Potential) describes physiological adaptation on a very abstract level as an antagonistic process as is shown in figure 1: An input flow (which usually is called "load" rate) is feeding identically a strain potential as well as a response potential. From the response potential the performance potential is increased by a positive flow, while the strain potential reduces it by a negative flow. Additionally, we have the following two effects: If the strain potential is filled over its upper limit, it produces an overflow, which acts on the performance potential as a reducing negative flow. In turn, the difference between the upper limit of the strain potential and its current level indicates how far the situation is from such a dangerous overflow. We call that difference the "reserve" of the system. Finally, in order to model atrophy the performance potential continuously looses substance. By mathematical reasons, this loss has to be fed back to the response potential to preserve the potential balance of the system. (From the mathematical point of view, the load rate only plays the role of a pump, which moves the system potential around without violating this balance property.) All flows show specific delays modelling the time that components need to act. Delays are model parameters the model behaviour depends on in a quite characteristic way. DR: Delay in Response flow DS: Delay in Strain flow DSO: Delay in Strain Overflow DA: Delay in Atrophy flow load rate overflow strain potential response potential DSO DS DR performance potential DA atrophy Figure 1: PerPot: Structure and parameters adaptation atrophy DS < DR DS > DR performance load performance load load performance collapse reserve performance load Figure 2: Characteristic types of PerPot behaviour
3 In figure 2 three types of characteristic behaviour of PerPot are shown: The type of normal adaptation depends on the relation between the delays DS and DR. Too little load can effect atrophy. Too much load can effect overflow and collapse. (See Mester et al. (2000), Perl (2001c).) 2.2 DYCON: PerPot-driven Neural Network We use Neural Networks of type "Kohonen Feature Map" (KFM) for learning of patterns. The idea briefly spoken is (see figure 3) that so called signals, which are components of a complex pattern, change the topology of the network i.e. the neurones or nodes rearrange in order to approximate the signal impulse (see Perl (2001b)). Figure 3: Kohonen Feature Map, excited by a signal The result finally is (see figure 4) that signals that represent similar patterns effect clusters of neurones, which in turn are separated from other clusters that represent different patterns. left center left center center right right classes of similar patterns cluster of neighboured neurones Figure 4: Classes of similar patterns, represented by clusters of neighboured neurones So far, KFMs are very helpful for classifying process patterns and have successfully been used to analyse game processes in soccer, volleyball, and squash (see Lames & Perl (1999), Wünstel et al. (1999), Perl & Lames (2000)). The problem however is that KFMs on the one hand need a lot of information for learning patterns and on the other hand are not able to learn dynamically. This means, if the learning process is once closed the network can only be used for testing and is not able to learn any more.
4 These problems were the reasons to improve the original KFM-approach by combining it with the dynamic adaptive concept of PerPot resulting in the type of Dynamically Controlled Network DyCoN. Without getting into technical details, every neurone of such a DyCoN is controlled by its own specific PerPot, enabling an individual learning behaviour of each single neurone. DyCoN can learn continuously over time and can continue learning processes after interruptions. It so, besides other advantages, in particular can learn using only very small amounts of specific data e.g. if they are used to coin networks, which are already pre-trained with similar or virtually generated data. Moreover, as the following figures demonstrate, DyCoN shows specific properties from the controlling PerPots, and so can be used for modelling learning processes under the aspect of adaptation. In detail, figure 5 on the right hand side shows a network with two coined patterns, each represented by a collection of equally coloured squares. The red one was learned first, reaching a certain level of "presence" meaning something like the probability of getting positive response when testing the pattern on the this way prepared network. After the red pattern the green one was learned, reaching the same level of presence the red one reached before. Note that the green pattern increases its level of presence already during the learning phase of the red one. This is due to the fact that both patterns are not disjunctive but have some components in common. Finally, "training intensity" characterises the level of "load" the neurone-controlling PerPots are run with, and which can be used to speed up or slow down the learning process. pattern presence training intensity presence of the first pattern of the second pattern training intensity training step Figure 5: Cooperative pattern training with a DyCoN second pattern first pattern pattern presence training intensity presence of the second pattern of the first pattern training intensity training step Figure 6: Replacing pattern training with a DyCoN
5 Different to the cooperative training from figure 5 the following figure 6 shows a replacing training, where the first pattern is getting erased during the training process of the second one. The reason is that the training intensity was reduced when starting the training of the second pattern. So, together with a rather high starting presence of the second pattern and an atrophy of the first pattern the second one becomes dominant although it is learning on a rather low intensity. Finally, figure 7 demonstrates how high training intensity can speed up learning. However, this approach in reality also well-known as over-training eventually can result in break downs, which are caused by collapse situations in the neurone controlling PerPots. pattern presence training intensity pattern presence training intensity training step Figure 7: Speed up and break down on a DyCoN, effected by high intensity caused collapse The above presented behavioural phenomena of DyCoN can be used in two ways. On the one hand they can support learning and analysis of patterns given as data from original games or other time-dependent processes (see chapter 3.1). Here DyCoN in particular helps to handle small data amounts and/or complex data structures (see chapters 3.2 and 3.3). Moreover, continuous learning enables an adaptation of the network to changing situations (as e.g. are tactical situations of teams) and so helps to analyse time-depending development processes. On the other hand, the similarity between the learning phenomena of DyCoN and that of learning individuals could be a reason, first to check whether it in fact can be used for modelling learning processes (validation) and, if so, then to model such learning behaviour in order to better understand and to improve the respective processes (calibration). First steps in this direction are discussed in chapter 4. 3 Examples: DyCoN-based pattern learning and analysis 3.1 Clusters as quantitative process patterns of games In the first example, which is presented in figure 8, a game process in football is modelled as a sequence of positions on the play ground, where "position" can mean that of players as well as that of the ball (or even of the referee). In order to keep the number of possible situations small, the positions just separate the five main values "right" (1), "centre right " (2), "centre" (3), "centre left" (4), and "left " (5). Even
6 with only 5 positions the number 5 t of possible processes of length t becomes horribly large with increasing t e.g. for t =10 we have about different processes. Assumed that each process needs about 10 seconds, the whole game consists of less than 600 processes. So the probability of each of the possible processes of being got to work during a game is about 0, much too small for any significant empirical result. The way to handle this problem is to build clusters of similar processes, which in turn needs an idea or a model what "similar" means and which clusters are necessary. Here a KFM can help to automatically detect proper clusters and so to separate the game processes into classes which represent the characteristic structure of the game (also see figure 4). state / position process step Figure 8: Pattern of a game process (also see figure 4) time The procedure described above does not work if the number of processes available for the network training is not greater then at least about One way to avoid this problem is to generate data by Monte Carlo methods, which however need information about the structure the data are generated from. If such a structure is known, a basic DyCoN can be prepared using virtually generated data. The so pre-trained network then can be used to learn and test the original game data as is shown in figure 10: Here the DyCoN-Network was trained with virtual process data generated using a model of the characteristic process structure of squash matches. In the case of squash, this model was derived from the following idea (see figure 9): Figure 9: Squash process as a sequence of striking positions
7 Understanding a player specific squash process as a sequence of striking positions of that player, mainly the four positions "back-right" (BR), "back-left" (BL), "front-right" (FR), and "front-left" (FL) have to be distinct. A process then is characterised by moves from one position to the following one (via the "T", of course), which in general can be described by the probabilities of changing positions and so feed into a stochastic transition matrix, from which finally arbitrary processes by Monte Carlo method can be generated. In order to make processes of different lengths comparable the processes are normalised to length 4. In case of original processes this means to cut them continuously into pieces of length 4. In order not to loose any transition information the procedure of cutting was done using a sliding window, such that each position in a process defines the start of a single 4-step-process (except the last three ones, of course). The special number "4" number as well. Figure 10 shows turned out to fit best but can be replaced by any other not too large a DyCoN that is pre-trained with virtually generated processes of length 4, where the most frequent ones are marked by circles and explained by legends in the regarding boxes. This network has been the basis for squash specific trainings and tests, some of which are presented in the following figures. BR-FR-BL-BL BR-BL-BL-BL BR-BR-BL-BR BR-BR-BR-BR FL-BR-BR-BL BL-BR-BR-BL BL-BR-BR-BR BL-BL-BL-BL BR-BL-BR-BR BR-FL-BR-BR Figure 10: DyCoN, pre-trained with virtually generated squash processes A B BR-FR-BL-BL FL-BR-BR-BL BR-FR-BL-BL FL-BR-BR-BL BR-BL-BL-BL BL-BR-BR-BL BR-BL-BL-BL BL-BR-BR-BL BR-BR-BL-BR BL-BR-BR-BR BR-BR-BL-BR BL-BR-BR-BR BR-BR-BR-BR BL-BL-BL-BL BR-BR-BR-BR BL-BL-BL-BL BR-BL-BR-BR BR-BL-BR-BR BR-FL-BR-BR BR-FL-BR-BR Learning and testing player-specific squash data on such a pre-trained DyCoN results in proc- ess patterns that characterise specific playing structures, as is shown in figure 11: Player A and B were well-qualified opponents in a semi-final of an international tournament. Figure 11: Process patterns of opponent players in a squash game
8 As in figures 6 and 7, the diameters of the coloured circles represent the frequencies of the regarding processes and so represent a pattern of the player specific game structure. As can be seen, the quantitative distribution of position sequences for both the players are rather similar, which means that none of the players is able to dominate his opponent from a general tactical point of view. In the contrary it seems that both players tend to find something like a "common rhythm". We found similar effects in a lot of examples and discussed it with squash expert who confirmed our result from practical experiences. However, the clustered processes of course build extremely aggregated information. Deeper analyses show that there are also different tactical structures, depending on the player as well as on the phase of the game. Again, DyCoN can help to find such details and to prepare more specific time depending processes. This is done in Section 3.2 using trajectories as appropriate mappings of complex processes. The current section 3.1 will be closed with a different approach of the network-based cluster analysis, which has successfully been used in the case of tennis and volleyball. One problem with that games is that not so much the positions are of interest but more the phases of the game or specific techniques used in the phases. So in tennis we e.g. have the phases of first and second service, the return phase, the base line phase, or the net phase. In volleyball besides others we have e.g. field defence, assist, or block. All these phases can be handled like attributes describing the playing process. They have, however, in common that they cannot be measured numerically like positions. This in particular means they do not define natural similarity and so a priori cannot be used to define clusters of similar processes. The way we handled this problem is as follows: At any step in a process, different phases are possible, which have different values of probability (stochastic approach) or degrees of membership (fuzzy approach). Therefore each step of the process has to be mapped to a distribution vector regarding to the set of possible phases (or techniques or whatsoever). As an example, this is done in figure 12 (compare figure 8), where the numbers on top encode the phases and the lines from top to bottom mean the process step. For each process step we have a distribution measuring the probability or the grade of membership of the respective phase to that step. It can be seen in the example, that the first steps are quite unambiguously determined, where the later steps become more undetermined and fuzzy. Because the distance of such distributions can be measured, this gives a way to measure the similarity between phase oriented processes Figure 12: Phase pattern of a pattern oriented process
9 3.2 Trajectories as qualitative process patterns The examples of network-based process analysis from above are quantitative in so far, as the patterns are focused on the frequencies of processes (diameters of circles and squares in the graphics above). In many cases, however, the sequence or time-series of the processes play the important role. One main example from the area of games is squash itself: In Figure 11 the frequencies of the 4-step-processes are represented by the diameters of the circles, giving a quantitative understanding of the respective distributions of activities. This presentation, however, is not informative if we want to know how specific playing processes look like e.g. in order to get a qualitative understanding of the game. In this case we need the time-oriented sequences, i.e. the so-called trajectories, of that 4-step processes, as is shown in figure 13. BR-FR-BL-BL BR-BL-BL-BL BR-BR-BL-BR BR-BR-BR-BR FL-BR-BR-BL BL-BR-BR-BL BL-BR-BR-BR BL-BL-BL-BL BR-BL-BR-BR BR-FL-BR-BR Figure 13: Qualitative patterns: trajectory of 4-step-processes in squash The particular example from figure 13 represents a trajectory from (FL,BR,BR,BR) over (BR,BR,BL,BR) and (BR,BL,BR,BR) to (BL,BR,BR,BR) and so altogether represents the activity sequence (FL,BR,BR,BL). Figure 14 compares the quantitative with the qualitative presentation, where the upper graphics show the frequently activated 4-step-processes imbedded into areas meaning "mainly BR" (red line), "mainly BL" (green line), "mainly F" (blue line) and, regarding the same areas, the lower graphics show the corresponding trajectories. Figure 14: Quantitative patterns (frequencies) vs. qualitative patterns (trajectories)
10 Although the trajectories in this particular example surely give too much details and so are not very informative, the different process structures in the three pictures and there correspondence to the frequencies are obvious. (Note that in the upper graphics only the most frequent activities are marked, while in the lower graphics necessarily each step of the regarding sequence had to be marked.) We need those trajectories in particular if the activities associated to neurones are not whole processes like in the examples of football, volleyball or squash, but are single events in a time-series. E.g. in a motion process the single event is a time-dependent vector of coordinates, angles, and speeds, which together build the pattern of the motion and the best presentation of which is a trajectory. The following section gives a couple of examples: 3.3 Trajectory detection from complex processes Structure analysis of 2-dimensionl movements The following example (see figure 15) deals with the well-known experiment of learning to follow a (green) light on a screen with a (red) pencil. (x 1,y 1 ) (x 1,y 1 ) d 1 (x 2,y 2 ) (x 2,y 2 ) d 2.. (x r,y r ) (x r,y r ) d r 1 2 r Figure 15: Analysis of visually driven movements The example shows that it depends on the point of view what information is necessary and what presentation fits best: (1) If only a sequence of values is of interest, as e.g. the deviation d i in the example from figure15, there are two ways to handle it: (a) If the focus is on the complete process (like in football, tennis or volleyball), then the process as a whole can be trained to the network and will be represented by a specific associated neurone. (b) If the focus is on single steps together with their close process context (like in squash), then k-step-processes as consecutive parts of the whole sequence should be trained, leading to results and presentations comparable to that from squash. This in particular means that similar to squash either the quantitative frequency-oriented presentation or the qualitative trajectoryoriented presentation can be chosen. If the complete data-vectors are of interest, e.g. consisting of light coordinates, pencil coordinates and deviation in the example from figure 15, they can be handled as components of a process as in case (1). The problem however is that compared with case (1) the dimension of the event space now multiplies that from case (1). Hence, the resolvability of the network is reduced significantly and so information, which should better be separated, might be aggregated into clusters. In this case, the following alternative fits better:
11 (2) If the focus is on the data vectors, they themselves can be handled as complete "processes" (i.e. they play the role of the 4-step-processes in squash) and can be associated to neurons. Similar to squash then either the quantitative frequency-oriented presentation or the qualitative trajectory-oriented presentation can be chosen. In the particular project figure 15 stems from, we chose the approach (1)(b) with absolute coordinates and quantitative representation. The result was that only the most frequent k-stepsequences were mainly represented by neurones which means that the network automatically selected the interesting regions, i.e. regions where the moving direction turned (marked by rectangles in figure 15). Reduction of attribute dimensions Another approach of using Neural Networks is that of reducing complexity: Figure 16 shows the situation that the original data are high-dimensional vectors, where the numbers of data are comparably small. In this case, stochastic analysis normally is useless, and even training of "conventional" KFM does not work. The reason is, as was discussed in the example of football from above, that the space of possible events is so large that the small number of recorded data cannot represent its specific structure. However, using a stochastically pre-trained DyCoN we can "burn in" the recorded data resulting in a 2-dimensional trajectory meaning a reduction from high-dimensional data vectors to the 2-dimensional coordinates of the associated trajectory points. Figure 16 shows two examples of those (x,y)-trajectories, which now can be analysed or manipulated much easier than the replaced original (a,b,,z)-data. (a 1,b 1,...,z 1 ) (a 2,b 2,...,z 2 ) (a t,b t,..., z t ) (x 1,y 1 ) (x 2,y 2 ) (x t, y t ) Figure 16: Reduction by transforming attribute vectors to pairs of coordinates Figure 17 gives an example from a current project dealing with running: The data vectors are of dimension 20 and consist of coordinates, angles, and velocities. Each time series consists of about 20 vectors, recorded at equidistant time steps. The aim (besides others) is two find out intra- and inter-individual similarities as well as reference patterns. Obviously, neither stochastic nor deterministic conventional method fit for a situation like this. Using the "burning in" method from above, however, results in a 2-dimensonal trajectory representation of the high-dimensional data vectors and so enables a rather simple analysis of the movement patterns. This way, we successfully did the following analyses: Intra-individual comparisons of the movements of the right and the left leg. Intra-individual comparisons of movements with different speed. Inter-individual comparisons in order two find similarities and reference patterns.
12 In particular, we were able to separate relevant from irrelevant or redundant attribute data and so, as is shown in figure 17, could reduce the dimension of the data vectors itselves (a 1,1,...,a 1,20 ) (a 2,1,...,a 2,20 ) (a t,1,..., a t,20 ) (a 1,1,...,a 1,10 ) (a 2,1,...,a 2,10 ) (a t,1,..., a t,10 ) Figure 17: Reduction by selecting relevant attributes Even more extreme was the situation in a project dealing with the process of rehabilitation, where 12-dimensional data vectors characterised the time-dependent states of the patients, and time-series of only about 5 to 10 points in time had to be analysed in order to estimate the success of the healing process. We did it the same way as in the example above and got satisfying results: In particular, key situations could be marked and compared with the trajectories, indicating whether the process run successfully or not. Moreover, this type of analysis not only gives a quantitative value of the rehabilitation process like "good" or "bad" but also gives an idea of its qualitative dynamics, showing that in some of the processes the situations were not at all improved from "bad" to "good" but showed cycles or even contra-productive developments. (The trajectories from figure 16 are examples from the rehabilitation project.) So far, the presented approaches, projects, and results were to demonstrate that the DyCoN concept is useful for dynamic learning and recognition of patterns as well as for reduction and evaluation of information. The following paragraph will sketch some ideas, how to use and improve these skills in order to support un-supervised acting of so-called multi-agents and even robots. 4 Outlook 4.1 Situation patterns and activity selection Acting in a game, for a player means (1) to recognise his situation in the context of the game, (2) to select an appropriate action, and (3) to value and to learn the success of his action in the context of the game (see figure 18). The three marked phases require three different types of network application: The situation recognition can be done using KFMs, the action selection can be controlled by feed forward networks, and the feed back and learning should be done using a DyCoN. This last step, however, needs a lot of additional investigation and is currently dealt with in a sponsored project. Some main aspects of this project are briefly sketched in the following: First of all there is a discrepancy between the number and the relevance of events. This means that very frequent events in a process often are not the most relevant ones. In contrary, often the really relevant events are very seldom key events, which in turn are not recognised by networks because of its low number. So one first step in the project has to be to connect Dy-
13 CoN with a database in order to find out relevant events that have not been recorded in associated neurone clusters. Here the term "relevant" has to be discussed: There are a lot of quantitative and qualitative interpretations of that term. What we mean briefly spoken is an event that triggers a nonfrequent or non-standard process with a very specific successful or non-successful result. The second step than should be the additional integration of those relevant events into the situation recognition of DyCoN. This means to control not only the neurone-training but also the structural development of a DyCoN dynamically, where methods and strategies have to be developed for. Finally in a third step DyCoN should be given the skill of being "creative". This means that the selection of an activity not only should depend on highest frequencies but also on special contexts and specific associated processes. situation recognition environment game feed back player activity selection Figure 18: Player's feed back measuring the result of recognition and selection prozess data process type I striking feature process type II data base process type I process type II process type III process type III creative cluster striking feature type I creative channels type II type III network Figure 19: Dynamically completing a trained DyCoN by "creative" clusters and channels
14 Figure 19 shows how "creative" completion of an already trained DyCoN will be done by implanting an additional cluster representing the associated striking features of the learned process data. The result could be not only additional clusters but also associative channels between clusters, which cannot appear in conventional networks of Kohonen type (also see figure 20): If a player recognises his current situation, he normally connects to a specific standard type of activity, i.e. type I, II, or III in figures 19 and 20. If however the recognised situation is "close to the border" of the selected type, it might be a striking feature and so trigger an "creative" decision for an unusual activity. Moreover, such an association could be repeated in a reflected way, eventually leading back to a different standard activity, as is also shown in figure 20. situation player standard activity 1 standard activity 3 creative decision standard activity 2 Figure 20: Associations between activities, due to striking features and creative decisions Note however, that in the scenarios sketched above "player" always means multi-agent or robots, where multi-agent is a more general term: It means the member of a collection of structurally identical objects, which on the basis of an though identical software individually recognise their environment and select their activities. Multi-agents this way can learn individually, can make their own experiences, can develop their specific skills, and can communicate with each other. So finally they are able to collaborate on common tasks as e.g. a game is. Different from multi-agents, which normally are simulations (i.e. a software together with data), robots are material agents, able to move and to act in reality. During the last six years, an international community of working groups called "RoboCup" has developed, which, sponsored by international business', deals with football played by teams of simulated multi-agents or real robots. Although the long term goal of that multiagents of course is not to play against human players but to study and to improve robot behaviour, the similarity in the problems of learning, decision making, and acting between human beings and agents are striking. Therefore in both directions it seems to be worth to deal with those questions: On the one hand, the analysis of human acting can help to improve virtual agents and robots. On the other hand, the analysis of self-controlling and improving tactical behaviour of multi-agents might help for better understanding and improving human tactical behaviour as well. References Perl, J. (1999). Aspects and Potentiality of Unconventional Modeling of Processes in Sporting Events. In: B. Scholz-Reiter, H.-D. Stahlmann & A. Nethe (Eds.), Process Modelling, (S ). Berlin-Heidelberg: Springer.