Adaptive Load Balancing of Parallel Applications with Multi-Agent Reinforcement Learning on Heterogeneous Systems

Adatve Load Balancng of Parallel Alcatons wth Mult-Agent Renforcement Learnng on Heterogeneous Systems Johan PAREN, Kata VERBEECK, Ann NOWE, Krs SEENHAU COMO, VUB Brussels, Belgum Emal: ohan@nfo.vub.ac.be, {kaverbee, an.nowe, ksteenha}@vub.ac.be Jan LEMEIRE, Erk DIRKX PADX, VUB Brussels, Belgum Emal: lemere@nfo.vub.ac.be, erk@nfo.vub.ac.be Submtted for Scentfc Programmng ournal, secal ssue on Dstrbuted Comutng and Comutaton, IOS Press ABSRAC We reort on the mrovements that can be acheved by alyng machne learnng technques, n tcular renforcement learnng, for the dynamc load balancng of allel alcatons. he alcatons beng consdered here are coarse gran data ntensve alcatons. Such alcatons ut hgh ressure on the nterconnect of the hardware. Synchronzaton and load balancng n comlex, heterogeneous networks need fast, flexble, adatve load balancng algorthms. Vewng a allel alcaton as a one-state coordnaton game n the framework of mult-agent renforcement learnng, and by usng a recently ntroduced mult-agent exloraton technque, we are able to mrove uon the classc ob farmng aroach. he mrovements are acheved wth lmted comutaton and communcaton overhead. Keywords: Parallel rocessng, Adatve load balancng, renforcement learnng, heterogeneous network, ntellgent agents, data ntensve alcatons. 1. INRODUCION Load balancng s crucal for allel alcatons snce t ensures a good use of the caacty of the allel rocessng unts. We look at alcatons whch ut hgh demands on the allel nterconnect n terms of throughut. Examles are comresson alcatons whch both rocess mortant amounts of data and requre a lot of comutatons. Data ntensve alcatons [2] requre a lot of communcaton and are therefore dreaded for most allel archtectures. he roblem s exacerbated when workng wth heterogeneous allel hardware. hs s the case n our exerment usng a heterogeneous cluster of PC s to execute allel alcatons wth a master-slave software archtecture. Adatve load balancng s ndsensable f system erformance s unredctable and no ror knowledge s avalable [1]. In the mult-agent communty, adatve load balancng s an nterestng testbed for mult-agent learnng algorthms, lkewse for mult-agent renforcement algorthms as n [9][12]. However the nterretatons and models of load balancng there are not always n the vew of real allel alcatons. We reort on the results obtaned by adatve agents n the farmng scheme. he dea s to vew all slaves as ndeendent renforcement learnng agents who try to learn the amount of data to request from the master, so as to mnmze the total run tme of the allel alcaton and/or the total dle tme of the master. As the agents share a common goal, from the game theoretcal ont of vew [10], ths setu can be seen as a coordnaton game. Recently a new exloraton technque for ndvdual renforcement learners used n coordnaton games was ntroduced, see [16][17]. hs technque allows the slave to learn ndeendently and adatvely whch amount of data to request to the master; amng at an effcent use of the communcaton lnk by the grou of agents. Our results show that the mult-agent learnng technque mroves uon the sequental ob farmng scheme for allelzng data ntensve alcatons. Secton 2 defnes the roblem of load balancng and gves an overvew of the exstng load balancng strateges. Secton 3 dscusses erformance metrcs, whle secton 4 ntroduces renforcement learnng and outlnes the mult-agent renforcement learnng algorthm. Secton 5 and 6 gve the exermental setu and results acheved. Fnally some conclusons are drawn n secton 7. 2. LOAD BALANCING/JOB SCHEDULING Load balancng ams at assgnng to each rocessor an amount of work roortonal to ts erformance, mnmzng the executon tme of the rogram. However, rocessor heterogenety and erformance fluctuatons make statc load balancng nsuffcent [1]. We nvestgate dynamc, local, dstrbuted load balancng strateges [18], whch are based on heurstcs, snce fndng the otmal soluton has shown to be NP-comlete n general [11]. Followng the agent hlosohy, the request assgnment strategy s a recever-ntated algorthm [6], n whch the consumers of workload look for roducers [13]. he goal s a fast adatve system that otmzes comutaton and synchronzaton. Problem descrton In stuatons were the communcaton tme s not neglgble, as s the case for data ntensve alcatons, faster rocessng unts can ncur serous enaltes due to slower unts. A data request ssued

by a slow unt can stall a faster unt when usng farmng. hs of course results n a reducton of the allelsm. hs henomenon s bound to occur when slaves request dentcal amounts of data from the master. hs s ndeendently of the actual amount by neglectng the communcaton delay, accetable gven suffcently large requests. In order to mrove uon the ob-farmng scheme when workng wth heterogeneous hardware, the slaves have to request dfferent amount of data from the master (server). Indeed ther resectve consumton of communcaton bandwdth should be roortonal to ther rocessng ower. Slower rocessng unts should avod obstructng faster ones by requestng less data. Comutaton model he ntal comutaton model s ob farmng. In ths master-slave archtecture the slaves (one er rocessng unt) request a chunk of a certan sze from the master. As soon as the data has been rocessed the result s transferred to the master and the slave sends a new request (fgure 1). 1. request chunk 4. send result master 2. send chunk communcaton bottleneck slave 1 slave 2 slave 3. data heterogeneous crunchng rocessors Fg 1: Model. hs scheme has the advantage of beng both smle and effcent. Indeed, n the case of heterogeneous hardware, the load (amount of rocessed data) wll deend on the rocessng seed of the dfferent rocessng unts. Faster rocessng unts wll more frequently request data and thus be able to rocess more data. he bottleneck of data ntensve alcatons wth a master-slave archtecture s the lnk connectng the slaves to the master. In the resented exerments all the slaves share a sngle lnk to ther master (through an ethernet swtch). In ths scenaro the alcaton s erformance wll be nfluenced by the effcent use of the shared lnk to the master. Indeed, the fact that the alcaton has a coarse granularty only ensures that the comutaton communcaton rato s larger than one. But t does not reclude a low allel effcency even when usng ob farmng. 3. PERFORMANCE MERICS We quantfy the beneft of allel rocessng by the seedu S= seq /, how much faster the allel rogram runs wth resect to the runtme of the sequental verson. he orton of that s not used for useful comutaton s therefore as consdered lost rocessor cycles [4], or overhead: Overhead =. s. ( 1) Hence, the choce of seedu as the erformance measure mles means that each rocessor has tme allocated to erform ts t of the ob: 1 2 = = =... =. ( 2) Wth +, =. ( 3) com com overhead the tme that rocessor erforms ts t, w of the total useful work W and overhead the tme not overlaed wth the comutaton of the overhead of tye. o study the overall mact of the overhead, we can rewrte Eq. (2) as =. ( 4) We wll develo the overhead equatons for homogeneous rocessors (wth equal comutng ower) whch mles that seq = com. ( 5) ogether wth Eqs. (3) and (4), the seedu can then be rewrtten as: S = ( 6), overhead 1+ seq he mact of the overhead on the seedu s thus reflected by ts rato wth the sequental runtme. Let us therefore defne ths rato for each overhead tye : overheads Ovh =. ( 7) seq hese ratos quantfy the cost of the overheads. Note that overhead s the summaton of that tye of overhead over all rocessors. Heterogenety Let us have a look how system heterogenety affects the erformance. For dfferent comutng owers, we ntroduce the relatve seed, measured wth resect to a reference machne: ( W ) com = ( 8) com( W ) Unlke most works on heterogeneous allel systems [19], we exress the seeds relatvely to a fxed machne [3], and not relatvely to the fastest rocessor. We thnk that choosng a fxed reference, wth whch the sequental runtme seq s measured, allows a clearer erformance analyss. Equaton (5) s based on the assumton that the runtme does not deend on a tcular orton of the work. Smlarly, we assume that the varous rocessors essentally dffer n ther clock seed [3] and not n the sze of the task. he total relatve rocessng ower of the allel system s then:.

PP = ( 9) For a heterogeneous system, the effcency s the erformance comed wth the deal erformance, namely PP: S E = ( 10) PP Wth the average relatve seed PP = we defne the degree of heterogenety H of a allel system by the standard devaton of [3]: 2 H = ( ) ( 11) o contnue the erformance analyss of the revous agrah, Eq. (5) should be relaced by s. com ( 12) = Also for the calculaton of the overhead rato Ovh (Eq. 7),, overhead should be scaled wth,. overhead ( 13) Ovh = seq hus, we can conclude that all tme measurements on a rocessor should be scaled wth. Slower rocessors wll carry out less useful work n a gven tme and the allel overheads wll have less mact on the seedu. hs result corresonds wth the aroach roosed n [4], whch essentally uses rocessor cycles as tme unt. Granularty In load-balancng roblems, the overheads are manly communcaton and dle tme, that we call blockng overhead. In our settng, the learnng algorthm can only mrove the erformance by mnmzng blockng, snce the communcaton tme reresents the lower bound. Moreover, the communcaton overhead s roortonal wth the total data sze of the work. For data-ntensve alcatons wth large messages, all other contrbutons to the communcaton tme, lke the latency or delay, can be neglected. he total work conssts of W quantums, where each quantum rocesses an amount of q data bytes, the communcaton tme becomes comm = β. qdata. W ( 14) where β s the tme to transmt a byte from the master to any slave. We assume t to be constant, as we do not consder heterogeneous communcaton networks. In the same way, the comutaton tme can be exressed as a functon of the work W by a frst order equaton. In ths equaton the lnear term s the man contrbuton, esecally for lnear data-ntensve alcatons that we consder. he comutaton tme becomes com = τ. qo. W ( 15) where q o s the number of oeratons er quantum work and τ reresents the atomc comutng tme er oeraton' on the reference machne. However, a ece of code cannot be dvded n equal oeratons, τ should be consdered a reference comutng tme. It s the roduct q o..τ, the length of a run-tme quantum, that we wll use and that we consdered to be constant. We then defne the granularty as a relatve measure of the amount of comutaton wth resect to the amount of communcaton wthn a allel algorthm mlementaton [14]: 1 com o gran = = =. ( 16) comm Ovhcomm β qdata It deends on hardware and software, so τ s β called the hardware granularty and q o the q data software granularty. he erformance s affected by the overall granularty, ndeendent of how the granularty s sread over software and hardware. We defne gran as the granularty measured on the reference machne and gran the granularty of rocessor. Snce the communcaton tme s fxed, the granularty of each rocessor s gran = gran. ( 17) Load balancng equlbrum (2a) (2b) Fg 2: Load balancng wth a clear comutaton bottleneck. Executon rofle(a) and overhead dstrbuton (b) τ q

Fg 3: Load balancng wth a clear communcaton bottleneck. Executon rofle(a) and overhead dstrbuton (b) We wll nvestgate the non-trval case of load balancng where the total comutaton ower of the slaves matches the communcaton bandwdth of the master for a gven total granularty. In other cases, there wll be ether a clear communcaton or comutaton bottleneck, what makes adatve load balancng unnecessary. Indeed, wth a total comutaton ower that s lower than the master s communcaton bandwdth, there s no bottleneck, the master wll be able to serve the slaves constantly and these wll work at 100% effcency. hs can be seen n the exerment of fgure 2, where the total comutaton ower s lower than the communcaton bandwdth. In that case, more rocessors can be added to ncrease the seedu. On the other hand, when the total comutaton ower ncreases, the master wll get requests at a hgher rate and the communcaton becomes the bottleneck. hen, the slaves wll start to block, watng to be served and ther effcency dros. hs can be seen n fgure 3, where the total comutaton ower s hgher than the communcaton bandwdth. In that case, the surlus of slave rocessors should better be used for other work. In both cases, the dstrbuton of the work can hardly be otmzed. We wll nvestgate the non-trval case, when comutaton and communcaton seed match. In that case, workload assgnment and synchronzaton becomes necessary. At the equlbrum ont, an deal load dstrbuton exsts, so that the master can serve the slaves constantly and these are fully busy. here wll be no blockng on master or slaves, so ther allel runtme s 1 = BW. ( 18) = + = Bw. + gran. Bw. ( > 1) comm com ( 19) Where gran s the granularty of rocessor. Snce all allel runtmes are equal (Eq. (2)), Eq. (18) equals Eq. (19) and w can be calculated for each slave w = BW. W gran ( 20) B.(1 + gran ) 1+ We know that W s the summaton of all w, hence = W 1+ gran w = = W ( 21) hs results n the load balancng equlbrum condton, whch ndcates a match between the systems communcaton bandwdth and rocessng ower 1 1+ gran = 1 ( 22) We ntroduce the load balancng equlbrum factor lbe to quantfy ths match 1 lbe = 1 + gran ( 23) Where lbe=1 reflects an equlbrum, lbe < 1 s for lower rocessng ower (Fg 2, where lbe=0.5) and lbe > 1 ndcates a communcaton bottleneck (Fg 3, where lbe=2) For a homogeneous system, gran s constant for all rocessors, and accordng to Eq.(22), the equlbrum condton becomes = gran + 2 and the seedu s then S = 2. Note that should be greater than 2, we need at least 2 slaves, so that one can comute when the other communcates. he load balancng equlbrum factor s then 1 lbe = 1 + gran ( 24) 4. MULI AGEN REINFORCEMEN LEARNING Renforcement learnng (RL) s the roblem faced by an agent that learns behavor through tral-and-error nteractons wth a dynamc envronment. A model of renforcement learnng conssts of a dscrete set of envronment states, a dscrete set of agent actons and a set of scalar renforcement sgnals. For each nteracton the agent receves renforcement and the next state of the envronment, and chooses an acton. he agent's ob s to fnd a olcy,.e. a mang from states to actons, whch maxmzes some long-turn measure of renforcement. hese rewards can take lace arbtrarly far n the future. o obtan a hgh overall reward, an agent has to refer actons that t has learned n the ast and found to be good,.e. exlotaton, however dscoverng such actons s only ossble by tryng out alternatve actons,.e. exloraton. Nether exlotaton, nor exloraton can be ursued exclusvely. In our load-balancng roblem settng rocessors are of the recever-ntated tye and can thus be vewed as agents, whch we wll gve extra learnng abltes. Each rocessor wll be an ndeendent RL agent, whch tres to learn an otmal chunk sze of data to ask the master, so that the blockng tme for others s mnmzed and therefore also the total comutaton tme. he agents actons are the ossble amounts of data or block szes avalable. Because we want to restrct ourselves to renforcement learners wth a dscrete set of actons, we secfy at the begnnng a set of ossble block szes the agents can choose

from. Snce there are multle slaves exstng together, whch nfluence each other, we have to use a mult-agent renforcement learnng scheme. In our alcaton, the slaves can only be n one state and they share a common goal. As such ths mult-agent roblem can be vewed as a coordnaton game, n whch the agents should learn to converge to the Pareto Otmal Nash equlbrum 1 of the game. In [17] an algorthm for learnng n a mult-agent coordnaton game s gven. We descrbe t n short n the next subsecton. Exloraton n a mult-agent coordnaton game he learnng scheme starts wth a number of exloraton erods and s followed by an exlotaton hase. At the begnnng of each exloraton erod, agents behave selfsh and nave;.e. they gnore the other agents n the envronment and use some tradtonal renforcement learnng technque to maxmze ther ayoff. he exloraton erod ends when the agents have found a Nash equlbrum. hs wll haen because t s assured by results from Learnng Automata theory [8] and also holds for other renforcement learnng algorthms such as Q-learnng [7][15]. Whch Nash equlbrum the agents fnd s not known n advance, t deends on the ntal condtons and the basn of attracton of the dfferent equlbrums. he goal s to converge to the best Nash equlbrum,.e. the Pareto Otmal equlbrum. After havng converged, every agent wll exclude ts last acton layed, so that the ont acton sace becomes consderable smaller. In a new erod of lay, a new Nash equlbrum wll be found. Before a new erod starts, the agent kees some statstcs for the acton t has converged too. When the agent has no actons left, ts orgnal acton sace s restored. New exloraton erods are layed untl the user decdes to sto the exloraton hase and decdes to exlot the best Nash equlbrum found. hs s acheved wthout any communcaton between the agents, because they sense the same ayoff for each ont acton layed. he ont acton they choose after exloraton s the best they have seen so far. Under some assumtons concernng erod-length and number of erods layed, convergence to the Pareto Otmal equlbrum s assured, even for stochastc rewards/ayoffs see [17]. In the next subsecton we exlan how to use ths mult-agent learnng algorthm n the load balancng exerment, whch can be vewed as an asynchronous verson 2 of a coordnaton game. 5. EXPERIMENAL SEUP 1 An outcome of a game s sad to be Pareto Otmal f there exsts no other outcome, for whch all the layers smultaneously do better. A Nash equlbrum of a game s a soluton for whch no agent can do better by changng hs olcy, as all the other kee layng ther Nash equlbrum olcy [10]. 2 Asynchronous n the sense that layers do not take ther actons synchronously, rewards may be delayed etc. o assess the resented algorthm for coarse gran data ntensve alcatons on heterogeneous allel hardware a synthetcs aroach has been used. An alcaton has been wrtten usng the PVM [5] message-assng lbrary to exerment wth the dfferent dmensons of the roblem. he alcaton has been desgned not to erform any real comutaton, but nstead t relaces the comutaton and communcaton hases by delays wth equvalent duraton 3. For the exermental setu, H and lbe are 2 exerment ameters. H and lbe are set to 1 n all exerments. he granularty of the rocessors wll then be chosen randomly accordng to a normal 1 dstrbuton wth 1, derved average = cce from the homogeneous case granularty, Eq. (24), and stddev = H. hs guarantees the exerment to be n an nterestng load balancng equlbrum. Learnng to request data In our load balancng settng, the slaves use a reward-nacton learnng automata scheme for selfsh lay durng the exloraton erods [8]. A learnng automaton descrbes the nternal state of an agent as a robablty dstrbuton accordng to whch actons should be chosen. hese robabltes are adusted accordng to the success or falure of the actons taken. he udate scheme we wll use here s the reward-nacton udate scheme, gven n equaton (25). A constant reward ameter a between 0 and 1 s used to renforce good actons. he feedback r gves the renforcement from the envronment. For ths scheme t s roven that layers wll converge to a Nash equlbrum when t s used n a one-state coordnaton game, see [8]. ( n + 1) = ( n) + a. r( n).(1 ( n)) f acton was chosen at tme n ( n + 1) = ( n) a. r( n). ( n) for all actons dfferent from ( 25) he feedback r rovded to the learner s the nverse blockng tme, whch s the tme a slave has to wat before the master acknowledges ts request for data. It s used to udate the acton robabltes by Eq.(25). Less nterestng actons wll ncur hgher blockng tmes and thus wll have lower robablty assocated wth them. In our exerment, the slaves wll learn to request a chunk sze that mnmzes ts blockng tme. o that end each slave has a stochastc learnng automaton whch uses Eq.(25), as shown n Fg 4. In the resented results the block sze s a multle of a gven ntal block sze. Here the multles are 1, 2 and 3 tmes the ntal block sze 3 he exermental alcaton can easly be turned nto a real alcaton by relacng the delay roducng code wth real code.

Chunk sze S 1 S 2 S 3 P P 1 P 2 P 3 random choce of S ~ P feedback: 1/blocktme request S blockng comm & com of S Fg 4: Renforcement learnng durng an exloraton erod. At the end of an exloraton erod the slaves have converged to a Nash equlbrum,.e. each slave has learnt a block sze to ask the master. hey send to the master ther average erformance durng ths erod, whch s the average amount of data each was able to comute. When the master has receved ths message from each slave, the overall reward, s sent back to each slave, whch uses ths to udate ts statstcs for the acton last layed. he overall reward the agents receve for the ont acton layed s gven by the total average amount of data the agents comuted n allel durng the erod of lay. hs ayoff can easly be attached to the data ackages the master sends to ts slaves, no extra communcaton s generated. he same argument goes for the slave sendng erod nformaton to the master. he number of exloraton erods s chosen n advance. After all erods have been layed, the slaves chose the acton whch was rewarded best by the master. We wll come our adatve algorthm wth a statc load-balancng scheme where fxed amounts of data are requested. 6. EXPERIMENS Fgure 5 shows the tme course of a tycal exerment n ts exlotaton erod wth the comutaton, communcaton and blockng hases. (data sze = 10GB, communcaton seed = 10MB/s, #slaves=3, average chunk sze =1MB, H=1, lbe=1, the length of the exloraton erod = 100s and the number of erods = 10) maxmally exloted. Performance results As shown n table 1, we have run several exerments wth varyng ont acton saces (JAS). We use u to 7 slaves wth each 5 ossble actons, whch results n a ont acton sace wth 5 7 ossbltes. he 4 th column gves the absolute gan n total comutaton tme for the adatve mult-agent algorthm comed to the total comutaton tme for the farmng algorthm. he 5 th column gves the gan n total comutaton tme whch s maxmal ossble for the gven settng. he total comutaton tme can only decrease when the dle tme of the master s decreased. No gan s ossble on the total communcaton tme of the data. herefore the last column gves the absolute gan comed to the gan whch was maxmal ossble. In all exerments consderable mrovements were made. Agents are able to fnd settngs n whch the faster slave s blocked less than the others, see also Fg. 5. he adatve characterstc of the agents makes them able to take ther heterogenety nto account. In larger ont acton saces the gan dros, but by adustng the ameters of the learnng algorthm,.e. makng sure the agents can exlore more by enlargng the exloraton erods length and ncreasng the number of exloraton erods, erformance gets better agan n the last exerments. he data sze ncreased from 10 to 30 GB n larger ont acton saces, the erod length from 100 to 200 tme stes and the number of erods from 10 to 20. All the other settngs were taken as above. he fgures shown are averaged results for 10 runs of each exerment. Slaves Actons #JAS abs. gan max. gan rel. gan 2 3 9 8.40 13.54 62.00 2 4 16 8.32 13.54 60.99 2 5 32 6.64 13.54 49.02 3 3 27 4.69 14.14 33.17 3 5 125 4.33 14.14 30.65 5 3 243 1.39 13.13 10.60 5 5 3125 1.61 13.12 12.24 7 3 2187 2.40 12.20 16.91 7 5 7812 5 3.47 14.20 24.42 able 1: Average gan n total comutaton tme for adatve load balancng comed to statc load balancng (farmng). Fg 5: Executon rofle of the exlotaton hase of a tycal exerment wth 3 slaves We observe that the slaves have learned to dstrbute the requests ncely and hence use the lnk wth the master effcently. Slave 3, whch s the fastest rocessor wth lowest granularty, s served constantly and has no blockng tme. Slave 2 has some blockng tme. However, t s the slowest rocessor, so the rocessng ower of the system s able 2 shows the results for unform load balancng. In the unform verson, nstead of each agent learnng the best amount of chunk sze to ask out of a gven set of ossble szes, the agents wll now choose each ossble chunk sze wth a unform dstrbuton. Results show that unform lay s on average comable to statc load balancng. So, our learnng aroach erforms better than both.

Slaves Actons #JAS abs. gan max. gan rel. gan 2 3 9 1.32 13.54 9.71 2 4 16-1.41 13.54-10.41 2 5 32-1.30 13.54-9.56 3 3 27-0.87 14.14-6.12 3 5 125-1.48 14.14-10.44 5 3 243-1.61 13.13-12.28 5 5 3125-1.78 13.12-13.59 7 3 2187 1.01 14.20 7.11 7 5 78125 0.84 14.20 5.89 able 2: Average gan n total comutaton tme for unform load balancng comed to statc load balancng (farmng). Overhead Analyss able 3 gves an overvew of the gan n blockng overhead. he blockng tme reresents the cumulated blockng tme of all the slaves durng the allel run, the dle tme s the dle tme of the master. he numbers resented here are averaged over 10 ndeendent runs. Intally a allel run (usng ob farmng) wth 3 slaves whch can use 3 dfferent block szes resulted n an average of 165.27 seconds of dle tme for the master. However, when learnng s used the dle tme of master reduces to 115.08 seconds, a gan of 30.37%. he gan n blockng tme follows the same nterretaton. Slaves Actons Idle (gan) Blockng (gan) 2 3 55.6% 50.7% 2 4 54.3% 51.3% 2 5 47.7% 38.7% 3 3 30.3% 20.7% 3 5 26.8% 20.7% 5 3 4.0% -3.4% 5 5 0.98% 3.20% 7 3 9.70% 3.00% 7 5 11.6% 3.00% able 3: Average gan for adatve load balancng n dle tme of the master and total blockng tme of the slaves. As shown the task of reducng the blockng tme becomes harder as the number of slaves ncreases. hs s a drect result of the exonental ncrease n sze of the search sace. 7. CONCLUSIONS Comlex, heterogeneous system controlled for otmal use by mult-agent renforcement learnng s romsng. We mlemented renforcement learners for dstrbuted load balancng of data ntensve alcatons. ogether they use a novel technque of coordnated exloraton, whch was already tested and roven to convergence to the Pareto Otmal Nash equlbrum n stochastc coordnaton games from game theory. he master-slave setu we used for our load balancng exerments was vewed as an asynchronous coordnaton game. Results show that the mult-agent algorthm used stll works for asynchronous games. he frst erformance results show consderable mrovements uon a statc load balancer. he agents are able to fnd settngs n whch the faster slave s less blocked durng communcaton wth the master than the others. he adatve characterstc of the agents makes them able to take ther heterogenety nto account. Even more when a slave s removed from the set-u, the remanng agents can adust themselves tot the new stuaton. he algorthm works locally on the slaves (recever-ntated), thus actng lke ntellgent agents. he agents use both local and global nformaton n ther learnng rocess, however no extra communcaton s needed n-between the slaves to acheve ths global nformaton. Every agent communcates ts erod-erformances to the master, whch n turn wll send the global nformaton back to every slave. hs extra nformaton can be attached to the global data ackages and requests, wthout causng extra overload. We are lannng to use ths mult-agent aroach for data ntensve alcatons such as comresson alcatons. 8. REFERENCES [1] I. Bancescu and V. Velusamy, Load Balancng Hghly Irregular Comutatons wth the Adatve Factorng, Proceedngs of the 16 th Internatonal Parallel & Dstrbuted Processng Symosum, IEEE, Los Alamtos, Calforna, 2002. [2] M. D. Beynon,. Kurc et al. Effcent Manulaton of Large Datasets on Heterogeneous Storage Systems, Proceedngs of the 16 th Internatonal Parallel & Dstrbuted Processng Symosum, IEEE, Los Alamtos, Calforna, 2002. [3] Clemats, A. and Corana, A., Modelng erformance of heterogeneous allel comutng systems, In Parallel Comutng 25, Elsever Scence, 1999. [4] Crovella, M. E. and Leblanc,.J., Parallel Performance Predcton usng Lost Cycles Analyss, n Proc. of Suercomutng 94, IEEE Comuter Socety, 1994. [5] A. Gest, A. Begueln et al., PVM: Parallel Vrtual Machne, the MI ress, 1994. [6] D. Guta and P. Be, "Load sharng n dstrbuted systems", In Proceedngs of the Natonal Worksho on Dstrbuted Comutng, January 1999. [7] Kaelblng L.P., Ltmann M.L., Moore A.W.,: Renforcement Learnng: A Survey. Journal of Artfcal Intellgence Research 4 (1996) 237-285. [8] Narendra K., hathachar M., Learnng Automata: An Introducton, Prentce-Hall (1989). [9] Nowé, A., Verbeeck, K., Dstrbuted Renforcement learnng, Loadbased Routng a case study, Proceedngs of the Neural, Symbolc and Renforcement Methods for sequence Learnng

Worksho at ca99, 1999. [10] Osborne J.O.,Rubnsten A., A course n game theory, Cambrdge, MA: MI Press (1994). [11] C.C. Prce and S. Krshnarasad, Software allocaton models for dstrbuted systems, n Proceedngs of the 5 th Internatonal Conference on Dstrbuted Comutng, ages 40-47, 1984. [12] Schaerf A., Shoham Y., ennenholtz M., Adatve Load Balancng: A Study n Mult-Agent Learnng, Journal of Artfcal Intellgence Research (1995) 475-500. [13]. Schnekenburger and G. Rackl, Imlementng Dynamc Load Dstrbuton Strateges wth Orbx, Internatonal Conference on Parallel and Dstrbuted Processng echnques and Alcatons (PDPA'97), Las Vegas, Nevada, 1997. [14] Stone H.S., 1993. Hgh-Performance Comuter Archtecture, Addson-Wesley, Massachusetts, 1993. [15] Sutton, R.S., Barto, A.G., Renforcement Learnng: An ntroducton, Cambrdge, MA: MI Press (1998). [16] Verbeeck, K., Nowé, A., Lenaerts., Parent, J., Learnng to reach the Pareto Otmal Nash Equlbrum as a eam, LNAI 2557- Proceedngs of the 15th Australan Jont Conference on Artfcal Intellgence. P 407-418 (2002). [17] Verbeeck, K., Nowé, uyls, K., Coordnated Exloraton n Stochastc Common Interest Games, hrd symosum on Adatve Agents and Mult-agents Systems, AAMAS-3. (2003) [18] M.J. Zak, We L; S. Parthasarathy, Customzed dynamc load balancng for a network of workstatons, Proceedngs of the Hgh Performance Dstrbuted Comutng (HPDC'96), IEEE, 1996. [19] X. Zhang, Y. Yan, Modelng and characterzng allel comutng erformance on heterogeneous networks of workstatons, n Proc. of the 7 th IEEE Symosum on Par. And Dstr. Proc., IEEE, 1995.