IMPROVING ADAPTIVE GAME AI WITH EVOLUTIONARY LEARNING



Similar documents
State of Louisiana Office of Information Technology. Change Management Plan

On Adaboost and Optimal Betting Strategies

Data Center Power System Reliability Beyond the 9 s: A Practical Approach

A New Evaluation Measure for Information Retrieval Systems

JON HOLTAN. if P&C Insurance Ltd., Oslo, Norway ABSTRACT

Ch 10. Arithmetic Average Options and Asian Opitons

The one-year non-life insurance risk

Detecting Possibly Fraudulent or Error-Prone Survey Data Using Benford s Law

Modelling and Resolving Software Dependencies

RUNESTONE, an International Student Collaboration Project

Unsteady Flow Visualization by Animating Evenly-Spaced Streamlines

10.2 Systems of Linear Equations: Matrices

Cross-Over Analysis Using T-Tests

Stock Market Value Prediction Using Neural Networks

FAST JOINING AND REPAIRING OF SANDWICH MATERIALS WITH DETACHABLE MECHANICAL CONNECTION TECHNOLOGY

INFLUENCE OF GPS TECHNOLOGY ON COST CONTROL AND MAINTENANCE OF VEHICLES

Using research evidence in mental health: user-rating and focus group study of clinicians preferences for a new clinical question-answering service

Professional Level Options Module, Paper P4(SGP)

A Generalization of Sauer s Lemma to Classes of Large-Margin Functions

Firewall Design: Consistency, Completeness, and Compactness

Enterprise Resource Planning

A Comparison of Performance Measures for Online Algorithms

MODELLING OF TWO STRATEGIES IN INVENTORY CONTROL SYSTEM WITH RANDOM LEAD TIME AND DEMAND

An intertemporal model of the real exchange rate, stock market, and international debt dynamics: policy simulations

A Data Placement Strategy in Scientific Cloud Workflows

Chapter 9 AIRPORT SYSTEM PLANNING

Optimal Control Policy of a Production and Inventory System for multi-product in Segmented Market

Minimum-Energy Broadcast in All-Wireless Networks: NP-Completeness and Distribution Issues

Option Pricing for Inventory Management and Control

Math , Fall 2012: HW 1 Solutions

How To Segmentate An Insurance Customer In An Insurance Business

Dynamic Network Security Deployment Under Partial Information

Sustainability Through the Market: Making Markets Work for Everyone q

Performance And Analysis Of Risk Assessment Methodologies In Information Security

An introduction to the Red Cross Red Crescent s Learning platform and how to adopt it

Unbalanced Power Flow Analysis in a Micro Grid

Search Advertising Based Promotion Strategies for Online Retailers

Seeing the Unseen: Revealing Mobile Malware Hidden Communications via Energy Consumption and Artificial Intelligence

GPRS performance estimation in GSM circuit switched services and GPRS shared resource systems *

The higher education factor: The role of higher education in the hiring and promotion practices in the fire service. By Nick Geis.

Heat-And-Mass Transfer Relationship to Determine Shear Stress in Tubular Membrane Systems Ratkovich, Nicolas Rios; Nopens, Ingmar

Optimal Energy Commitments with Storage and Intermittent Supply

Modeling and Predicting Popularity Dynamics via Reinforced Poisson Processes

Improving Emulation Throughput for Multi-Project SoC Designs

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 14 10/27/2008 MOMENT GENERATING FUNCTIONS

Rural Development Tools: What Are They and Where Do You Use Them?

Product Differentiation for Software-as-a-Service Providers

Towards a Framework for Enterprise Architecture Frameworks Comparison and Selection

CALCULATION INSTRUCTIONS

ThroughputScheduler: Learning to Schedule on Heterogeneous Hadoop Clusters

Achieving quality audio testing for mobile phones

ISSN: ISO 9001:2008 Certified International Journal of Engineering and Innovative Technology (IJEIT) Volume 3, Issue 12, June 2014

Digital barrier option contract with exponential random time

EU Water Framework Directive vs. Integrated Water Resources Management: The Seven Mismatches

Differentiability of Exponential Functions

Dow Jones Sustainability Group Index: A Global Benchmark for Corporate Sustainability

! # % & ( ) +,,),. / % ( 345 6, & & & &&3 6

A Blame-Based Approach to Generating Proposals for Handling Inconsistency in Software Requirements

Aon Retiree Health Exchange

The most common model to support workforce management of telephone call centers is

Minimizing Makespan in Flow Shop Scheduling Using a Network Approach

View Synthesis by Image Mapping and Interpolation

DECISION SUPPORT SYSTEM FOR MANAGING EDUCATIONAL CAPACITY UTILIZATION IN UNIVERSITIES

A review of conceptual transparency in US and Colombian seismic design building codes

Supporting Adaptive Workflows in Advanced Application Environments

Lagrangian and Hamiltonian Mechanics

Software Diversity for Information Security

Introduction to Integration Part 1: Anti-Differentiation

Improving Direct Marketing Profitability with Neural Networks

Risk Adjustment for Poker Players

This post is not eligible for sponsorship and applicants must be eligible to work in the UK under present visa arrangements.

A New Pricing Model for Competitive Telecommunications Services Using Congestion Discounts

The development of an innovative education curriculum for yr old children with type 1 diabetes mellitus (T1DM)

Consumer Referrals. Maria Arbatskaya and Hideo Konishi. October 28, 2014

Characterizing the Influence of Domain Expertise on Web Search Behavior

MAP-ADAPTIVE ARTIFICIAL INTELLIGENCE FOR VIDEO GAMES

Trace IP Packets by Flexible Deterministic Packet Marking (FDPM)

Chapter 4: Elasticity

USING SIMPLIFIED DISCRETE-EVENT SIMULATION MODELS FOR HEALTH CARE APPLICATIONS

Automatic Long-Term Loudness and Dynamics Matching

11 CHAPTER 11: FOOTINGS

zupdate: Updating Data Center Networks with Zero Loss

Example Optimization Problems selected from Section 4.7

Bond Calculator. Spreads (G-spread, T-spread) References and Contact details

BOSCH. CAN Specification. Version , Robert Bosch GmbH, Postfach , D Stuttgart

Hull, Chapter 11 + Sections 17.1 and 17.2 Additional reference: John Cox and Mark Rubinstein, Options Markets, Chapter 5

Bellini: Ferrying Application Traffic Flows through Geo-distributed Datacenters in the Cloud

Transcription:

IMPROVING ADAPTIVE GAME AI WITH EVOLUTIONARY LEARNING Marc Ponsen Lehigh University / Computer cience & Engineering 19 Memorial Drive West Bethlehem, PA 18015-3084 UA mjp304@lehigh.eu Pieter pronck Maastricht University / IKAT P.O. Box 616, NL-6200 MD Maastricht, The Netherlans p.spronck@cs.unimaas.nl KEYWORD Games, artificial intelligence, real-time strategy, ynamic scripting, evolutionary algorithms. ABTRACT Game AI is efine as the ecision-making process of computercontrolle opponents in computer games. Aaptive game AI can improve the entertainment provie by computer games, by allowing the computer-controlle opponents to fix automatically weaknesses in the game AI, an to respon to changes in humanplayer tactics online, i.e., uring gameplay. uccessful aaptive game AI is base invariably on omain knowlege of the game it is use in. Dynamic scripting is an algorithm that implements aaptive game AI. The omain knowlege use by ynamic scripting is store in a rulebase with manually esigne rules. In this paper we propose the use of an offline evolutionary algorithm to enhance the performance of aaptive game AI, by evolving new omain knowlege. We empirically valiate our proposal, using ynamic scripting as aaptive game AI in a real-time-strategy (RT) game, in three steps: (1) we implement an test ynamic scripting in an RT game; (2) we use an offline evolutionary algorithm to evolve new tactics that are able to eal with optimise tactics, which ynamic scripting cannot efeat using its original rulebase; (3) we translate the evolve tactics to rules in the rulebase, an test ynamic scripting with the revise rulebase. The empirical valiation shows that the revise rulebase yiels a significantly improve performance of ynamic scripting compare to the original rulebase. We therefore conclue that offline evolutionary learning can be use to improve the performance of aaptive game AI. INTRODUCTION Traitionally, commercial game evelopers spen most of their resources on improving a game s graphics. However, in recent years, game evelopers have begun to compete with each other by proviing a more challenging gaming experience (Rabin 2004). For most games, challenging gameplay is equivalent to having high-quality game AI (Lair 2000). Game AI is efine as the ecision-making process of computer-controlle opponents. Even in state-ofthe-art games, game AI is, in general, of inferior quality (chaeffer 2001, Lair 2001, Gol 2004). It tens to be preictable, an often contains weaknesses that human players can exploit. Aaptive game AI, which implies the online (i.e., uring gameplay) aaptation of the behaviour of computercontrolle opponents, has the potential to increase the quality of game AI. It has been wiely isregare by game evelopers, because online learning tens to be slow, an can lea to unesire behaviour (Manslow 2002). However, acaemic game AI researchers have shown that successful aaptive game AI is feasible (Demasi an Cruz 2002, Johnson 2004, pronck, prinkhuizen-kuyper an Postma 2004a). To ensure the efficiency an reliability of aaptive game AI, it must incorporate a great amount of prior omain knowlege (Manslow 2002, pronck, prinkhuizen-kuyper an Postma 2004b). However, if the incorporate omain knowlege is incorrect or insufficient, aaptive game AI will not be able to generate satisfying results. In this paper we propose an evolutionary algorithm to improve the quality of the omain knowlege use by aaptive game AI. We empirically valiate our proposal by testing it on an aaptive game AI technique calle ynamic scripting, use in a real-time strategy (RT) game. The outline of the remainer of the paper is as follows. First, we iscuss RT games, an the game environment selecte for the experiments. Then, we iscuss the implementation of ynamic scripting for RT games, followe by a iscussion of the implementation of an evolutionary algorithm that generates successful tactics for RT games. The achieve results are use to show how the tactics iscovere with an evolutionary algorithm can be employe to improve the original ynamic scripting implementation. Finally, we raw conclusions an inicate future work. REAL-TIME-TRATEGY GAME RT games are simple military simulations (war games) that require the player to control armies (consisting of ifferent types of units), an efeat all opposing forces. In most RT games, the key to winning lies in efficiently collecting an managing resources, an appropriately istributing these resources over the various game elements. Typical game elements in RT games inclue the construction of builings, the research of new technologies, an combat. Game AI in RT games etermines the tactics of the armies controlle by the computer, incluing the management of resources. Game AI in RT games is particularly challenging for game evelopers, because of two reasons: (1) RT games are complex, i.e., a wie variety of tactics can be employe, an (2) ecisions have to be mae in real-time, i.e., uner severe time constraints. RT games have been calle an ieal test-be for real-time AI research (Buro 2003). For our experiments, we selecte the RT game WARGU, with TRATAGU as its unerlying engine. TRATAGU is an open-source engine for builing RT games. WARGU implements a clone of the highly popular RT game WARCRAFT II. While the graphics of WARGU are not up-toate with toay s stanars, its gameplay can still be consiere state-of-the-art. Figure 1 illustrates WARGU. The figure shows a battle between an army of orcs, which

Figure 1: creenshot of a battle in WARGU. approach from the bottom right, an an army of humans, which attempt to efen a base consisting of several builings. ADAPTIVE GAME AI IN RT GAME Game AI for complex games, such as RT games, is mostly efine in scripts, i.e., lists of rules that are execute sequentially (Tozour 2002). Because the scripts ten to be long an complex (Brockington an Darrah 2002), they are likely to contain weaknesses, which the human player can exploit. Because scripts are static they cannot aapt to overcome these exploits. pronck et al. (2004a) esigne a novel technique calle ynamic scripting that realises the online aaptation of scripte opponent AI. Experiments have shown that the ynamic scripting technique can be successfully incorporate in commercial Computer RolePlaying Games (CRPGs) (pronck et al. 2004a, 2004b). Because the game AI for WARGU is efine in scripts, ynamic scripting shoul also be applicable to WARGU. However, because of the ifferences between RT games an CRPGs, the original ynamic scripting implementation cannot be transferre to RT games unchange. In this section a ynamic scripting implementation for the game AI in RT games is esigne an evaluate. The basics of ynamic scripting are explaine first. Then, we highlight the changes mae to ynamic scripting to apply it to RT games, an iscuss the implementation of ynamic scripting in WARGU. The implementation is evaluate, an the evaluation results are iscusse. Dynamic cripting Dynamic scripting is an online learning technique for commercial computer games, inspire by reinforcement learning (Russel an Norvig 1995). Dynamic scripting generates scripte opponents on the fly by extracting rules from an aaptive rulebase. The rules in the rulebase are manually esigne using omain-specific knowlege. The probability that a rule is selecte for a script is proportional to a weight value that is associate with each rule, i.e., rules with larger weights have a higher probability of being selecte. After every encounter between opponents, the weights of rules employe uring gameplay are increase when having a positive contribution to the outcome, an ecrease when having a negative contribution. The size of the weight changes is etermine by a weight-upate function. To keep the sum of all weight values in a rulebase constant, weight changes are execute through a reistribution of all weights in the rulebase. Through the process of punishments an rewars, ynamic scripting graually aapts to the human player. For CRPGs, it has been shown that ynamic scripting is fast, effective, robust an efficient (pronck et al. 2004a). Dynamic cripting for RT games Our esign of ynamic scripting for RT games has two ifferences with ynamic scripting for CRPGs. The first ifference is that, while ynamic scripting for CRPGs employs ifferent rulebases for ifferent opponent types in the game (pronck et al. 2004a), our RT implementation of ynamic scripting employs ifferent rulebases for the ifferent states of the game. The reason for this eviation from the CRPG implementation of ynamic scripting is that, in contrast with CRPGs, the tactics that can be use in an RT game mainly epen on the availability of ifferent unit types an technologies. For instance, attacking with weak units might be the only viable choice in early game states, while in later game states, when strong units are available, usually weak units will have become useless. The secon ifference is that, while ynamic scripting for CRPGs executes weight upates base on an evaluation of a fight, our RT implementation of ynamic scripting executes weight upates base on both an evaluation of the performance of the game AI uring the whole game (calle the overall fitness ), an on an evaluation of the performance of the game AI between state changes (calle

the state fitness ). As such, the weight-upate function is base on the state fitness, combine with the overall fitness. The use of both evaluations for the weight-upates increases the efficiency of the learning mechanism (Manslow 2004). Dynamic cripting in WARGU We implemente the ynamic scripting process in WARGU as follows. Dynamic scripting starts by ranomly selecting rules for the first state. When a rule is selecte that spawns a state change, from that point on rules will be selecte for the new state. To avoi monotone behaviour, we restricte each rule to be selecte only once for each state. At the en of the scripts, a loop is implemente that initiates continuous attacks against the enemy. Because in WARGU the available builings etermine the unit types that can be built an technologies that can be researche, we ecie to istinguish game states accoring to the type of builings possesse. Consequently, state 2 Th,Ba,Lm 5 Th,Ba,Lm, 1 Th,Ba Lm Kp 3 Th,Ba, 6 Kp,Ba,Lm 4 Kp,Ba Kp Lm Kp Lm t Kp 7 Kp,Ba, t Lm t Lm 8 Kp,Ba,t changes are spawne by rules that comprise the creation of new builings. The twenty states for WARGU, an the possible state changes, are illustrate in Figure 2. We allowe a maximum of 100 rules per script. The rulebases for each of the states containe between 21 an 42 rules. The rules can be ivie in four basic categories: (1) buil rules (for constructing builings), (2) research rules (for acquiring new technologies), (3) economy rules (for gathering resources), an (4) combat rules (for military activities). To esign the rules, we incorporate omain knowlege acquire from strategy guies for WARCRAFT II. The overall fitness function F for player controlle by ynamic scripting (henceforth calle the ynamic player ) yiels a value in the range [0,1]. It is efine as: min(, b) + o F = max( b, ) + o { lost} { won} In equation (1), represents the score for the ynamic player, o represents the score for the ynamic player s opponent, an b [0,1] is the break-even point. At the breakeven point, weights remain unchange. For the ynamic player, the state fitness F i for state i is efine as:, i, i+ o, i Fi =, i, i 1, i+ o, i, i 1+ o, i 1 { i = 1} { i> 1} (1) (2) Th Ba Lm Kp t Ca Ap Mt Tm = Townhall = Barracks = Lumbermill = Blacksmith = Keep = tables = Castle = Airport = Magetower = Temple 9 Kp,Ba,Lm, 14 t,ap Mt 17 t,ap,mt 10 Kp,Ba,Lm,t t Lm 12 Kp,Ba,Lm,,t Ca 13 t Ap Mt Tm 15 t,mt Tm Ap Tm Ap 18 t,ap,tm Tm Mt Ap 20 t,ap,mt,tm Figure 2: Game states in WARGU. 11 Kp,Ba,,t 16 t,tm Mt 19 t,mt,tm In equation (2),,x represents the score of the ynamic player after state x, an o,x represents the score of the ynamic player s opponent after state x. The score function is omain-epenent, an shoul successfully reflect the relative strength of the two opposing players in the game. For WARGU, we efine the score x for player x as: = 0.7M + 0. 3B (3) x x x In equation (3), M x represents the military points for player x, i.e. the number of points aware for killing units an estruction of builings, an B x represents the builing points for player x, i.e. the number of points aware for training armies an constructing builings. After each game, the weights of all rules employe are upate. The weight-upate function translates the fitness functions into weight aaptations for the rules in the script. The weight-upate function W for the ynamic player is efine as: max W W = min W min org b F b Fi, Worg 0.3 P 0.7 P b b F b Fi b + 0.3 R + 0.7 R, Wmax 1 b 1 b { F < b} { F b} (4)

In equation (4), W is the new weight value, W org is the original weight value, P is the maximum penalty, R is the maximum rewar, W max is the maximum weight value, W min is the minimum weight value, F is the overall fitness of the ynamic player, F i is the state fitness for the ynamic player in state i, an b is the break-even point. The equation inicates that we prioritise state performance over overall performance. The reason is that, even if a game is lost, we wish to prevent rules from being punishe (too much) in states where performance is successful. In our simulation we set P to 175, R to 200, W max to 1250, W min to 25 an b to 0.5. Evaluating Dynamic cripting in WARGU We evaluate the performance of ynamic scripting for RT games in WARGU, by letting the computer play the game against itself. One of the two opposing players was controlle by ynamic scripting (the ynamic player), an the other was controlle by a static script (the static player). Each game laste until one of the players was efeate, or until a certain perio of time ha elapse. If the game ene ue to the time restriction (which was rarely the case), the player with the highest score was consiere to have won. After the game, the rulebases were aapte, an the next game was starte, using the aapte rulebases. A sequence of 100 games constitute one test. We teste four ifferent tactics for the static player: 1. mall Balance Lan Attack (BLA): The BLA is a tactic that focuses on lan combat, keeping a balance between offensive actions, efensive actions, an research. The BLA is applie on a small map. Games on a small map are usually ecie swiftly, with fierce battles between weak armies. 2. Large Balance Lan Attack (LBLA): The LBLA is similar to the BLA, but applie on a large map. A large map allows for a slower-pace game, with longlasting battles between strong armies. 3. olier s Rush (R): The solier s rush aims at overwhelming the opponent with cheap offensive units in an early state of the game. ince the solier s rush works best in fast games, we teste it on a small map. 4. Knight s Rush (KR): The knight s rush aims at quick technological avancement, launching large offences as soon as strong units are available. ince the knight s rush works best in slower-pace games, we teste it on a large map. To quantify the relative performance of the ynamic player against the static player, we use the ranomization turning point (RTP). The RTP is measure as follows. After each game, a ranomization test (Cohen 1995; pp. 168 170) is performe using the fitness values over the last ten games, with the null hypothesis that both players are equally strong. The ynamic player is sai to outperform the static player if the ranomization test conclues that the null hypothesis can be rejecte with 90% probability in favour of the ynamic player. The RTP is the number of the first game in which the ynamic player outperforms the static player. A low value for the RTP inicates goo efficiency of ynamic scripting. If the player controlle by ynamic scripting is unable to statistically outperform the static player within 100 games, the test is aborte. For the BLA we ran 31 tests. For the LBLA we ran 21 tests. For both the R an KR, we ran 10 tests. Results The results of the evaluation of ynamic scripting in WARGU are isplaye in Table 1. From left to right, the table isplays (1) the tactic use by the static player, (2) the number of tests, (3) the lowest RTP foun, (4) the highest RTP foun, (5) the average RTP, (6) the meian RTP, (7) the number of tests that i not fin an RTP within 100 games, an (8) the average number of games won out of 100. Tactic Tests Low High Avg Me >100 Won BLA 31 18 99 50 39 0 59.3 LBLA 21 19 79 49 47 0 60.2 R 10 10 1.2 KR 10 10 2.3 Table 1: Evaluation results of ynamic scripting in RT games. From the low values for the RTPs for both the BLA an the LBLA, we can conclue that the ynamic player efficiently aapts to these two tactics. Therefore, we conclue that ynamic scripting in our implementation can be applie successfully to RT games. However, the ynamic player was unable to aapt to the solier s rush an the knight s rush within 100 games. As the rightmost column in Table 1 shows, the ynamic player only won approximately 1 out of 100 games against the solier s rush, an 1 out of 50 games against the knight s rush. The reason for the inferior performance of the ynamic player against the two rush tactics is twofol, namely (1) the rush tactics are optimise, in the sense that it is very har to esign game AI that is able to eal with them, an (2) the rulebase oes not contain the appropriate knowlege to easily esign game AI that is able to eal with the rush tactics. The remainer of this paper investigates how offline evolutionary learning can be use to improve the rulebase to eal with optimise tactics. EVOLUTIONARY TACTIC In this section we empirically investigate to what extent an evolutionary algorithm can be use to search for effective tactics for RT games. Our goal is to use offline evolutionary learning to esign tactics that can be use to efeat the two optimise tactics escribe in the previous section, the solier s rush an the knight s rush. In the following subsections we escribe the proceure use, the encoing of the chromosome, the fitness function, the genetic operators, an the achieve results. Experimental Proceure We esigne an evolutionary algorithm that evolves new tactics to be use in WARGU against a static player using the solier s rush an the knight s rush tactics. The evolutionary algorithm uses a population of size 50, representing sample solutions (i.e., game AI scripts). Relatively successful solutions (as etermine by a fitness

Chromosome tart tate 1 tate 2 tate m En tate tate marker tate number x Rule x.1 Rule x.2 Rule x.n Rule Rule ID Parameter 1 Parameter 2 Parameter p tart 1 C1 2 5 ef B 4 3 E 8 R 15 B 3 4 Rule 1.1 Rule 1.2 Rule 3.1 Rule 3.2 Rule 3.3 tate 1 tate 3 Figure 3: Design of a chromosome to store game AI for WARGU. function) are allowe to bree. To select parent chromosomes for breeing, we use size-3 tournament selection (Bucklan 2004). This metho prevents early convergence an is computationally fast. Newly generate chromosomes replace existing solutions in the population, using size-3 crowing (Golberg 1989). Our goal is to generate a chromosome with a fitness exceeing a target value. When such a chromosome is foun, the evolution process ens. This is the fitness-stop criterion. We set the target value to 0.75 against the solier s rush, an to 0.7 against the knight s rush. ince there is no guarantee that a solution exceeing the target value will be foun, the evolution process also ens after it has generate a maximum number of solutions. This is the run-stop criterion. We set the maximum number of solutions to 250. The choices for the fitness-stop an run-stop criteria were etermine uring preliminary experiments. Encoing The evolutionary algorithm works with a population of chromosomes. In the present context, a chromosome represents a game-ai script. To encoe a game-ai script for WARGU, each gene in the chromosome represents one rule. Four ifferent gene types are istinguishe, corresponing to the four basic rule categories mentione in the previous section, namely (1) buil genes, (2) research genes, (3) economy genes, an (4) combat genes. Each gene consists of a rule ID that inicates the type of gene (B, R, E an C, respectively), followe by values for the parameters neee by the gene. Of the combat gene, there are actually twenty variations, one for each possible state, each with its own parameters. The genes are groupe by states. A separate marker (), followe by the state number, inicates the start of a state. The chromosome esign is illustrate in Figure 3. A schematic representation of the chromosome, ivie into states, is shown at the top. Below it, a schematic representation of one state in the chromosome is shown, consisting of a state marker an a series of rule genes. Rule genes are ientifie by the number of the state for which they occur, followe by a perio, followe by a sequence number. Below the state representation, a schematic representation of one rule is shown. At the bottom, part of an example chromosome is shown. Chromosomes for the initial population are generate ranomly. By taking into account state changes spawne by buil genes, it is ensure that only legal game AI scripts are create. A more etaile escription of the chromosome esign can be foun in (Ponsen 2004). Fitness Function To measure the success of a game AI script represente by a chromosome, the following fitness function F for the ynamic player, yieling a value in the range [0,1], is efine: GC M min EC M + M F = M max b, M + M o, b o { lost} { won} In equation (5), M represents the military points for the ynamic player, M o represents the military points for the ynamic player s opponent, an b is the break-even point. GC represents the game cycle, i.e., the time it took before the game is lost by one of the players. EC represents the en cycle, i.e. the longest time a game is allowe to continue. When a game reaches the en cycle an neither army has been completely efeate, scores at that time are measure an the game is aborte. The factor GC/EC ensures that losing solutions that play a long game are aware higher fitness scores than losing solutions that play a short game. Genetic Operators To bree new chromosomes, four genetic operators were implemente. By esign, all four genetic operators ensure that a chil chromosome always represents a legal game-ai script. Parent chromosomes are selecte with a chance corresponing to their fitness values. The genetic operators take into account activate genes. An activate gene is a gene that represents a rule that was execute uring the fitness etermination. Non-activate genes can be consiere irrelevant to the game-ai script the chromosome represents. If a genetic operator prouces a chil chromosome that is equal to a parent chromosome for all activate genes, the chil is rejecte an a new chil is generate. (5)

1. tate Crossover selects two parents, an copies states from either parent to the chil chromosome. tate crossover is controlle by matching states. A matching state is a state that exists in both parent chromosomes. Figure 2 makes evient that, for WARGU, there are always at least four matching states, namely state 1, state 12, state 13, an state 20. tate crossover will only be use when there are least three matching states with activate genes. A chil chromosome is create as follows. tates are copie from the first parent chromosome to the chil chromosome, starting at state 1 an working own the chromosome. When there is a state change to a matching state, there is a 50% probability that from that point on, the role of the two parents is switche, an states are copie from the secon parent. When the next state change to a matching state is encountere, again a switch between the parents can occur. This continues until the last state has been copie. 2. Rule Replace Mutation selects one parent, an replaces economy, research or combat rules with a 25% probability. Builing rules are exclue, both for an as replacement, because these coul spawn a state change an thus coul possibly corrupt the chromosome. 3. Rule Biase Mutation selects one parent an mutates parameters for existing economy or combat rules with a 50% chance. The mutations are execute by aing a ranom integer value in the range [ 5,5]. 4. Ranomization generates a ranom new chromosome. Ranomization ha a 10% chance of being selecte uring evolution. The other genetic operators ha a 30% chance. Results The results of ten tests of the evolutionary algorithm against each of the two optimise tactics are shown in Table 2. From left to right, the columns show (1) the tactic use by the static player, (2) the number of tests, (3) the lowest fitness value foun, (4) the highest fitness value foun, (5) the average fitness value, an (6) the number of tests that ene because of the run-stop criterion. Tactic Tests Low High Avg >250 R 10 0.73 0.85 0.78 2 KR 10 0.71 0.84 0.75 0 Table 2: Evolutionary algorithm results. The table shows surprisingly high average, highest, an even lowest solution-fitness values. Therefore, it may be conclue that offline aaptive game AI was successful in rapily iscovering game-ai scripts able to efeat both rush tactics use by the static player. IMPROVING ADAPTIVE AI In the first experiment, we iscovere that our original implementation of ynamic scripting i not achieve satisfying results against the two rush tactics. In the previous section we evolve new tactics that were able to efeat the two rush tactics. In the present section we iscuss how the evolve tactics can be use to improve the rulebases employe by ynamic scripting, to enable it to eal with the rush tactics with more success. First, we iscuss observations on the evolve tactics. Then, we iscuss the translation of the evolve tactics to rulebase improvements. Finally, we evaluate of the new rulebases by repeating the first experiment with the new rulebases. Observations on the Evolve Tactics About the solutions evolve against the solier s rush, the following observations were mae. The solier s rush is use on a small map. As is usual for a small map, the game playe by the solutions was always short. Most solutions inclue only two states with activate genes. Basically, we foun that all ten solutions counter the solier s rush with a solier s rush of their own. In eight out of ten solutions, the solutions inclue builing a blacksmith very early in the game, which allows the research of weapon an armour upgraes. Then, the solutions selecte at least two out of the three possible research avancements, after which large attack forces were create. These eight solutions succeee because they ensure their soliers are quickly upgrae to be very effective, before they attack. The remaining two solutions overwhelme the static player with sheer numbers. About the solutions evolve against the knight s rush, the following observations were mae. The knight s rush is use on a large map, which entice longer games. On average, for each solution five or six states were activate. Against the knight s rush, all solutions inclue training a large number of workers to be able to expan quickly. They also inclue boosting the economy by exploiting aitional resource sites after setting up efences. Almost all solutions evolve against the knight s rush worke towars the goal of quickly creating avance military units, in particular knights. even out of ten solutions achieve this goal by employing a specific builing orer, namely a blacksmith, followe by a lumbermill, followe by a keep, followe by stables. Two out of ten solutions preferre a builing orer that reache state 11 as quickly as possible (see Figure 2). tate 11 is the first state that allows the builing of knights. urprisingly, in several solutions against the knight s rush, the game AI employe many catapults. WARCRAFT II strategy guies generally consier catapults to be inferior military units, because of their high costs an consierable vulnerability. A possible explanation for the successful use of catapults by the evolutionary game AI is that, with their high amaging abilities an large range, they are particularly effective against tightly packe armies, such as groups of knights. Improving the Rulebase for Dynamic cripting Base on our observations we ecie to create four new rules for the rulebases, an to (slightly) change the parameters for several existing combat rules. The first new rule was esigne to be able to eal with the solier s rush. The rule containe the pattern that was observe in most of the tactics evolve against the solier s rush, namely a combination of the builing of a blacksmith, followe by the research of several upgraes, followe by the creation of a large offensive force. The secon rule was esigne to be able to eal with the knight s rush. Against the knight s rush, almost all evolve

solutions aime at creating avance military units quickly. The new rule checks whether it is possible to reach a state that allows the creation of avance military units, by constructing one new builing. If such is possible, the rule constructs that builing, an creates an offensive force consisting of the avance military units. The thir rule was aime at boosting the economy by exploiting aitional resource sites. The original rulebases containe a rule to this en, but this rule was invariably assigne a low weight. In the evolve solutions we iscovere that exploitation of aitional resource sites only occurre after a efensive force was built. The new rule acknowlege this by preparing the exploitation of aitional resource sites with the builing of a efensive army. The fourth rule was a straightforwar translation of the best solution foun against the knight s rush. imply all activate genes for each state were translate an combine in one rule, an store in the rulebase corresponing to the state. To keep the total number of rules constant, the new rules replace existing rules, namely rules that always ene up with low weights. Besies the creation of the four new rules, small changes were mae to some of the existing combat rules, by changing the parameters to increase the number of units of types clearly preferre by the solutions, an to ecrease the number of units of types avoie by the solutions. Through these changes, the use of catapults was encourage. More etails on the original an revise rulebases can be foun in (Ponsen 2004). Evaluating the Improve Rule-base in Wargus We repeate the first experiment, but with ynamic scripting using the new rulebases, an with the values of the maximum rewar an maximum penalty both set to 400, to allow ynamic scripting to reach the bounaries of the weight values faster. Table 3 summarises the achieve results. The columns in Table 3 are equal to those in Table 1. Tactic Tests Low High Avg Me >100 Won BLA 11 10 34 19 14 0 72.5 LBLA 11 10 61 24 26 0 66.4 R 10 10 27.5 KR 10 10 10.1 Table 3: Evaluation results of ynamic scripting in RT games using improve rulebases. A comparison of Table 1 an Table 3 shows that the performance of ynamic scripting is consierably improve with the new rulebases. Against the two balance tactics, BLA an LBLA, the average RTP is reuce by more than 50%. Against the two optimise tactics, the solier s rush an the knight s rush, the number of games won out of 100 has increase enormously. ince we observe that ynamic scripting assigne the new rules large weights, the improve performance can be attribute to the new rules. Note that, espite the improvements, ynamic scripting is still unable to statistically outperform the two rush tactics. The explanation is as follows. The two rush tactics are super-tactics, that can only be efeate by very specific counter-tactics, with little room for variation. By esign, ynamic scripting generates a variety of tactics at all times, thus it is unlikely to make the appropriate choices enough times in a row to reach the RTP. A possible solution to this shortcoming of aaptive game AI, is to allow it to recognise that an optimise tactic is use, an then oppose it with a pre-programme answer without activating a learning mechanism. Note, however, that since the existence of super-tactics can be consiere a weakness of game esign, a better solution woul be to change the game esign before the release of the game, to make super-tactics impossible. CONCLUION We set out to show that offline evolutionary learning can be use to improve the performance of aaptive game AI, by improving the omain knowlege that is use by the aaptive game AI. We implemente an aaptive game AI technique calle ynamic scripting, which uses omain knowlege store in rulebases, in the RT game WARGU. We teste the implementation against four manually esigne tactics. We observe that, while ynamic scripting was successful in efeating balance tactics, it i not o well against two optimise rush tactics. We then use evolutionary learning to esign tactics able to efeat the rush tactics. Finally, we use the evolve tactics to improve the rulebases of ynamic scripting. From our empirical results we were able to conclue that the new rulebases resulte in significantly improve performance of ynamic scripting against all four tactics. We raw three conclusions from our experiments. (1) Dynamic scripting can be successfully implemente in RT games. (2) Offline evolutionary learning can be use to successfully esign counter-tactics against strong tactics use in an RT game. (3) Tactics esigne by offline evolutionary learning can be use to improve the omain knowlege use by aaptive game AI, an thus to improve performance of aaptive game AI. Future Work It can be argue that a game is entertaining when the game AI attempts matching the playing strength of the human player, instea of efeating the human player. In parallel research, techniques have been investigate that allow ynamic scripting to scale the ifficulty level of the game AI to match the human player s skill, instea of optimise it (pronck, prinkhuizen-kuyper an Postma 2004c). In future work we will a ifficulty-scaling enhancements to ynamic scripting in RT games. We will also test ynamic scripting in RT games playe against humans, to etermine if aaptive game AI actually increases the entertainment value of a game. In the present research, the translation of the evolve solutions to improvements in omain knowlege was one manually. Because the translation requires unerstaning an interpretation of the evolve solutions, it is ifficult to perform the translation automatically. Nevertheless, in future work we will attempt to esign an automate mechanism that translates tactics evolve by offline evolutionary learning into an improve rulebase for ynamic scripting. The aition of such a mechanism woul enable us to completely automate the process of esigning successful rulebases for ynamic scripting.

REFERENCE Brockington, M an M. Darrah. 2002. How Not to Implement a Basic cripting Language. AI Game Programming Wisom (e.. Rabin), Charles River Meia, Hingham, MA, pp. 548 554. Bucklan, M. 2004. Builing better Genetic Algorithms. AI Game Programming Wisom 2 (e.. Rabin), Charles River Meia, Hingham, MA, pp. 649 660. Buro, M. 2003. RT Games as Test-Be for Real-Time AI Research. Proceeings of the 7th Joint Conference on Information cience (JCI 2003) (es. K. Chen et al.), pp. 481 484. Cohen, R.C. (1995). Empirical Methos for Artificial Intelligence, MIT Press, Cambrige, MA. Demasi, P. an A.J. e O. Cruz. 2002. Online Coevolution for Action Games. GAME-ON 2002 3r International Conference on Intelligent Games an imulation (es. Q. Mehi, N. Gough an M. Cavazza), C Europe Bvba, pp. 113 120. Gol, J. 2004. Object-Oriente Game Development, Aison-Wesley, harrow, UK. Golberg, D.E. 1989. Genetic Algorithms in earch, Optimization & Machine Learning, Aison-Wesley, Reaing, UK. Johnson,. 2004. Aaptive AI: A Practical Example. AI Game Programming Wisom 2 (e.. Rabin), Charles River Meia, Hingham, MA, pp. 639 647. Lair, J. E. an M. van Lent. 2000. Human-Level AI's Killer Application: Computer Game AI. Proceeings of AAAI 2000 Fall ymposium on imulating Human Agents, Technical Report F-00-03. AAAI Press 2000, pp. 80 87. Lair, J.E. 2001. It Knows What You re Going To Do: Aing Anticipation to a Quakebot. Proceeings of the Fifth International Conference on Autonomous Agents (es. J.P. Müller et al.), ACM Press, Montreal, Canaa, pp. 385 392. Manslow, J. 2002. Learning an Aaptation. AI Game Programming Wisom (e.. Rabin), Charles River Meia, Hingham, MA, pp. 557 566. Manslow, J. 2004. Using reinforcement learning to olve AI Control Problems. AI Game Programming Wisom 2 (e.. Rabin), Charles River Meia, Hingham, MA, pp. 591 601. Ponsen, M. 2004. Improving Aaptive AI with Evolutionary Learning. Mc Thesis, Delft University of Technology. Rabin,. 2004. AI Game Programming Wisom 2. Charles River Meia, Hingham, MA. Russel,. an J. Norvig. 1995. Artificial Intelligence: A Moern Approach. Prentice Hall, Pearson Eucation, Upper ale River, NJ. chaeffer, J. 2001. A Gamut of Games. AI Magazine, Vol. 22, No. 3, pp. 29 46. pronck, P., I. prinkhuizen-kuyper, an E. Postma. 2004a. Online Aaptation of Game Opponent AI with Dynamic cripting. International Journal of Intelligent Games an imulation (es. N.E. Gough an Q.H. Mehi), Vol. 3, No. 1, University of Wolverhampton an EUROI, pp. 45 53. pronck, P., I. prinkhuizen-kuyper, an E. Postma. 2004b. Enhancing the Performance of Dynamic cripting in Computer Games. Entertainment Computing ICEC 2004 (e. M. Rauterberg), LNC 3166, pringer-verlag, pp. 273 282. pronck, P., I. prinkhuizen-kuyper, an E. Postma. 2004c. Difficulty caling of Game AI. Proceeings of the GAME-ON 2004 Conference. (To be publishe) Tozour, P. 2002. The Perils of AI cripting. AI Game Programming Wisom (e.. Rabin), Charles River Meia, Hingham, MA, pp. 541 547.