Proceedngs of the 4 IEEE Conference on Cybernetcs and Intellgent Systems Sngapore, 1-3 December, 4 Learnng wth Imperfectons A Mult-Agent Neural-Genetc Tradng System wth Dfferng Levels of Socal Learnng Graham Kendall School of Computer Scence and IT ASAP Research Group Unversty of Nottngham Nottngham, NG8 1BB gxk@cs.nott.ac.uk Yan Su School of Computer Scence and IT ASAP Research Group Unversty of Nottngham Nottngham, NG8 1BB yxs@cs.nott.ac.uk Abstract Some real lfe dynamc systems are so large and complex that the ndvduals nsde the system can only partally understand ther envronment. In other words, the dynamc envronment s mperfect to ts partcpants. In ths paper, by usng the stock market as a test bed, we demonstrate an ntegrated ndvdual learnng and socal learnng model for optmsaton problems n dynamc envronments wth mperfect nformaton. By applyng dfferng levels of socal learnng process n an evolutonary smulated stock market, we study the mportance of socal learnng on the adaptablty of artfcal agents n mperfect envronments. Comparsons between the ntegrated ndvdual and socal learnng model and other evolutonary approaches for dynamc optmsaton problems, partcularly the memory-based approaches and mult-populaton approaches, are also drawn wth the emphass on optmsaton problems wth mperfect nformaton. Keywords mperfect envronments; neural-genetc system; stock market; ndvdual learnng; socal learnng. I. INTRODUCTION Evolutonary algorthms have been wdely used for solvng optmsaton problems n dynamc envronments. A dynamc envronment s generally charactersed by a multmodal and non-statc ftness space. Therefore, the am of an evolutonary algorthm s not only just to fnd optmal solutons, but also to contnuously adapt the solutons to a changng envronment [1, 2]. However, one of the mportant characterstcs of dynamc envronments has generally been neglected by researchers; that s the mperfectness of dynamc envronments. By mperfectness, we mean the ndvduals n the system can only partally understand ther envronment due to the complexty of the dynamc system or the nablty of the partcpants to perceve the entre envronment. In other words, a dfferent vew of the nformaton wthn the envronment s perceved by dfferent ndvduals. Therefore, the envronment conssts of a number of dfferent search spaces wth dfferent dmensons. These search spaces can overlap, or be completely ndependent. A good example s the stock market. The theory on the economcs of mperfect nformaton ponts out that dfferent partcpants n a fnancal market possess dfferent nformaton from the market [3, 4]. In the stock market, a large number of stock traders and nvestors consder dfferent vews of the market and use dfferent tradng strateges to make tradng decsons. Although the market s mperfect to stock traders and nvestors, due to the lmted nformaton they gather from the market, there are stll large number of nvestors who make a proft from the stock market. It s due to the mperfectness and the effcency of the stock market [5], that some nvestors can make profts over other nvestors wth nferor nformaton. In our prevous work [6, 7], we developed a mult-agent based smulated stock market model, where artfcal stock traders, modelled usng Artfcal Neural Networks (ANN), co-evolve wth each other by the means of an ntegrated ndvdual and socal learnng algorthm. The experments from [6, 7] demonstrated the artfcal stock traders learned to develop successful stock tradng strateges n an mperfect smulated stock market. In ths paper, we centre the study on the ntegrated ndvdual and socal learnng algorthm employed n the smulated stock market, from the perspectve of learnng wth mperfect nformaton n dynamc envronments, n partcular, the mpact of dfferng levels of socal learnng n an evolutonary system on the adaptablty of agents n mperfect dynamc envronments. In secton II, we dscuss the advantages of the ntegrated ndvdual and socal learnng algorthm and draw comparsons to other evolutonary approaches for dynamc optmsaton problems. Secton III descrbes the smulated stock market and the ntegrated ndvdual and socal learnng algorthm. Secton IV demonstrates experments on the smulated stock market wth dfferng levels of socal learnng. Conclusons and future work s presented n secton V. II. OPTIMISATION IN DYNAMIC ENVIRONMENTS One problem wth applyng evolutonary algorthms to dynamc optmsaton problems s that they wll eventually converge to a local optmum n the search space, and subsequently lose ther adaptablty when the underlyng envronment changes. Therefore, most evolutonary approaches endeavour to help the evolutonary algorthms escape from the local optmum and start a new search n the changed ftness space. See [2] and [8] for a good revew on the evolutonary approaches to dynamc optmsaton problems and [9] for some recent trends. -783-8643-4/4/$. 4 IEEE 47
The ntegrated ndvdual and socal learnng algorthm employed n our smulated stock market model [6, 7] essentally mmcs human learnng behavours n human socetes. Every person n a socety attempts to maxmse ther own utlty wth the resources avalable to them by means of ndvdual and socal learnng. In the face of a changng envronment, socal learnng plays an mportant role n enablng an ndvdual to adapt to new envronments by learnng from other better-adapted ndvduals [, 11]. Insde our smulated stock market, whlst every artfcal stock trader evolves ther own ndvdual tradng strategy, allowng them to trade more productvely utlsng a (possbly) unque set of market nformaton, the market tself also serves as a repostory of good tradng strateges and the knowledge s dssemnated among the traders over tme. The socal learnng mechansm enables the traders to explore dfferent search spaces that are constructed on dfferent nformaton sets wthn the envronment and therefore solves the problem of a market whch can only be partally understood by ts partcpants due to ts mperfectness as a dynamc system. The two evolutonary approaches dscussed n the revew [2], an explct memory-based approach and a multpopulaton approach, also employ the dea of savng the past experences n a socal memory. Explct memory-based approaches store good solutons from prevous generatons n a memory and rentroduce them later nto the search populaton [1, 12, 13]. Thus the dversty of the search populaton s mantaned. Such approaches sut envronments where changes occur perodcally,.e., envronments whose optmum keeps returnng to a prevous pont n the search space. Bendtsen and Krnk [14] took a further step by usng a dynamc memory that could be adjusted to the changes n the envronment by means of movng externally stored canddate solutons gradually towards the currently nearest best genomes n the search populaton. Smlarly, the mult-populaton approach [15] created a self-adaptve memory by mantanng small subpopulatons for some promsng areas n the search space. The man dfferences and advantages of the ntegrated ndvdual and socal learnng algorthm employed n the smulated stock market, compared wth the above two approaches, le n: Most mportantly, the above two approaches do not answer the queston about the mperfectness of a dynamc envronment. When a dynamc envronment has mperfect nformaton, the envronment conssts of a number of dfferent search spaces. Each of these search spaces has a dfferent search dmenson dependng on the dfferent sets of nformaton perceved from the envronment. These search spaces may overlap, or may be completely ndependent from each other. As an example, n the smulated stock market, as s n the real market, every stock trader uses a dfferent nformaton sets from the market for makng tradng decsons. The above two approaches enable the evolutonary algorthms jump from one area to another, new, promsng area n the same search space, but they do not solve the problem of movng from one search space to another search space wth dfferent nformaton. The ntegrated ndvdual and socal learnng algorthm n the smulated stock market solves ths problem by modellng the market as an mperfect envronment and enablng ndvduals to learn from others who use dfferent nformaton sets. Snce the envronment s non-statc, the search spaces n the envronment are also non-statc. A current search space may dsappear from the envronment because no one beleves that the nformaton s valuable. A new search space may appear because the nformaton has been dscovered by someone. As an example, n our smulated stock market, a trader may decde to dscard hs current strategy, and consder a new set of market ndcators from the market for developng new tradng strateges. The explct memory-based approach and multpopulaton approach both seem very smlar to the concept of socal learnng. However, these two approaches are stll learnng ndvdually from prevous experence. For example, n our smulated stock market, the artfcal traders are modelled as heterogeneous artfcal agents wth dfferent mnds. Before each trader publshes hs tradng strategy to the socety, he wll not only examne hs own performance, but also hs relatve performance compared wth other traders who use dfferent nformaton. In ths paper, we change the popularsaton of the socal learnng n the evolutonary market to dfferent extents, by applyng dfferent pressures on artfcal traders to take part n the socal learnng. Results from the experments show that there s a sgnfcant mpact of the socal learnng on the adaptablty of the artfcal stock traders and demonstrates the effectveness of the ntegrated ndvdual and socal learnng algorthm n solvng learnng problems wth mperfect nformaton. III. THE INDIVIDUAL AND SOCIAL LEARNING ALGORITHM Our smulated stock market s a neural-genetc stock tradng system. Basc market nformaton, such as stock prces and tradng volumes, are gven extraneously. Artfcal stock traders use Artfcal Neural Networks (ANN) to detect buy and sell sgnals from the market, and carry out ndvdual learnng by means of a Genetc Algorthm (GA). The followng descrbes the general structure of the tradng system: 1. Before tradng starts, there are actve traders n the smulated stock market. There are techncal and market ndcators and zero tradng strateges n a central pool. The ndcators are assgned an equal score of 1. Each trader selects a set of market ndcators randomly usng roulette wheel selecton. 2. Wth the set of ndcators selected, each trader generates ten dfferent models n the form of Artfcal Neural Networks as tradng strateges. These ten models may have dfferent network archtectures, but they use the same set of ndcators selected by the trader. The am s for the trader to evolve better tradng models by means of ndvdual learnng. 48
3. The tme span of the experment s dvded nto equal ntervals. Each nterval contans 125 tradng days (6 months tradng). 4. Each 125-day tradng perod s sub-dvded nto ntervals of 5 days. After each 5-day tradng perod, an ndvdual learnng s undertaken by means of a Genetc Algorthm (GA). 5. At the end of each 125-day tradng, socal learnng occurs and each trader s gven the opportunty to decde whether to look for more successful strateges from the pool or whether to publsh hs/her successful strateges nto the central pool dependng on two thresholds θ 1 and θ 2. 6. After socal learnng, the system enters the next 125- day tradng perod and steps 4, 5 and 6 are repeated. A. Indvdual Learnng As descrbed above, ndvdual learnng occurs every 5 days at whch tme the trader wll calculate the tradng model s rate of proft (ROP) by usng (1), where W s the trader s current assets (cash + shares). W s the trader s assets one week before. W W ROP = (1) W The ROP s used to descrbe a tradng model s proftablty. A Genetc Algorthm s used to elmnate tradng models wth poor performance and ntroduce new models by mutatng network connectons and changng network archtectures. The pseudocode of the GA algorthm s presented as follows. Select a model wth the lowest ROP to be elmnated; Select a model to be mutated usng roulette selecton; Decde number of connectons to be mutated, m; = ; Whle( < m){ Randomly select a connecton; Weght = weght + w; = + 1;} Wth 1/3 probablty add a hdden node; Wth 1/3 probablty delete a hdden node; Replace the model to be elmnated wth the new mutated model; Where m s a random nteger between and the total number of connectons n the selected neural network. w s a random Gaussan number wth a mean of zero and standard devaton of.1. B. Socal Learnng Socal learnng occurs at the end of every 6-month tradng perod through a self-assessment process. The self-assessment calculates how well the trader has performed n terms of hs own proftablty and hs relatve performance to other traders. The self-assessment process uses (2), (3), and (4). R S peer = (2) 49 Frst, the trader s rate of proft (ROP) for the past sx months s calculated by usng (1), and the traders are ranked from to 49 ( R ) accordng to ther ROP. Equaton (2) gves each trader a score n terms of peer pressure from other traders. In other words, ths score shows trader s performance compared to other traders. S self ROP ROP = (3) ROP s the rate of proft for the prevous sx months. Equaton (3) gves the trader s score n terms of hs own performance n the past sx months compared to the prevous sx months. Fnally, these two types of performance are composed nto (4), whch gves the fnal assessment ( σ ) for trader. σ = S peer + (4) (1 ) The fnal assessments for traders are then normalsed nto the range of [,1]. In order to study the mpact of the socal learnng on the learnng abltes of artfcal stock traders wth mperfect nformaton, we extend the socal learnng mechansm wth four dfferent parameter settngs that force dfferent pressures on artfcal traders n takng part n socal learnng. An arthmetc mean value ( Φ ) of all traders normalsed fnal assessments ( σ ) s calculated usng (5). Φ = 1 1+ e N N = 1 N s the total number of artfcal traders, whch equals to. A trader s socal behavour wll now fall nto four dfferent categores dependng on the values of σ and Φ, and two thresholds, θ 1, and θ 2 (see secton IV for the descrpton of the thresholds θ 1 and θ 2 ). The four cases,.e. four dfferent types of socal learnng behavours, are: σ 1 S self (5) 49
CASE1: The trader s successful and he s not usng a strategy learned from the market. The trader wll publsh hs novel tradng strategy nto the central pool and enter the next sx months tradng usng the same strategy. CASE2: The trader s successful and s usng a strategy learned from the market. He wll not publsh the strategy agan, but update ths strategy s score n the pool usng ther sx-month ROP (1). The trader wll then enter the next sx months tradng usng the same strategy. CASE3: The trader s not satsfed wth hs performance n the last sx-months tradng. The trader wll have.5 probablty of copyng a strategy from the central pool, whch means the trader wll dscard whatever model he s usng, and select a better tradng strategy from the pool usng roulette selecton, and go nto the next sx months tradng wth ths coped strategy. Or, wth.5 probablty, the trader wll decde to dscard whatever strategy he s usng, and select another set of ndcators as nputs, buld new models and go nto the next sx months tradng wth the new tradng models. CASE4: The trader s satsfed wth hs performance n past sx months and contnues usng that strategy. We only provde a general descrpton of the tradng system here to ensure the readablty of the paper. For more detals please refer to [6, 7] for more detals. IV. EXPERIMENTAL RESULTS AND DISCUSSION Fve stocks from the Hong Kong Stock Exchange and the Tokyo Stock Exchange are selected to be traded n the smulated stock market: CHEUNG KONG (1.HK), WHARF HOLDINGS (4.HK), CATHAY PAC AIR (293.HK), TOYOTA INDUS CORP (61.JP) and SONY CORP (6758.JP). The smulated stock market was tested on each of the fve stocks. The four dfferent settngs wth dfferng level of socal learnng are descrbed as followng. SETTING1 Socal learnng s turned off whle only ndvdual learnng occurs. Each trader n the smulated stock market evolves ndependently from each other wth dfferent sets of nformaton. Each trader can only search n hs own search space that s defned by the nformaton he selected from the envronment. SETTING2 Both ndvdual learnng and socal learnng are turned on. The threshold θ 1 for socal learnng s set to 1. The threshold θ 2 s set to.9. Traders behavour durng the socal learnng s descrbed n table I: TABLE I. TRADERS ACTIONS DURING THE SOCIAL LEARNING UNDER SETTING 2. SEE SECTION III FOR DESCRIPTION OF CASE 1 TO 4. Parameters σ = θ 1 σ < θ 2 θ 1 > σ θ 2 Trader s Acton CASE1/CASE2 CASE3 CASE4 Settng2 mmcs an envronment where only the best players are accepted to dstrbute ther knowledge, and strong motvatons are forced on ndvduals to learn from each other. SETTING3 Both ndvdual learnng and socal learnng are turned on. The threshold θ 1 for socal learnng s set to 1. The threshold θ 2 s set to the mean value Φ (See secton III). Traders behavour durng the socal learnng s descrbed n table II: TABLE II. TRADERS ACTIONS DURING THE SOCIAL LEARNING UNDER SETTING 3. SEE SECTION III FOR DESCRIPTION OF CASE 1 TO 4. Parameters σ = θ 1 σ θ 2 θ 1 > σ > θ 2 Trader s Acton CASE1/CASE2 CASE3 CASE4 Settng3 creates an envronment where only the best players are accepted but forces less strong motvatons on ndvduals to learn from each other compared wth settng1. SETTING4 Both ndvdual learnng and socal learnng are turned on. The parameter θ 1 for socal learnng s set to.9. The parameter θ 2 s set to the mean value Φ (See secton III). Traders behavour durng the socal learnng s descrbed n table III: TABLE III. TRADERS ACTIONS DURING THE SOCIAL LEARNING UNDER SETTING 4. SEE SECTION III FOR DESCRIPTION OF CASE 1 TO 4. Parameters σ > θ 1 σ θ 2 θ 1 σ > θ 2 Trader s Acton CASE1/CASE2 CASE3 CASE4 Settng4 mmcs an envronment where more ndvduals have the opportunty to dstrbute ther knowledge to the socety whle the learnng atmosphere wthn the socety s moderate. The expermental results are depcted n Fg. 1 to compare the algorthm where no socal learnng occurs wth the algorthms wth dfferng levels n socal learnng. All results are taken from a sngle run on each stock under the four dfferent settngs. We compare the traders performance from each smulaton wth two benchmarks: bank savngs and buyand-hold strategy. Bank savngs means the trader nvests the same amount of money n the bank throughout the whole tradng perod, recevng nterest from the bank. Buy-and-hold means the trader keeps hs entre asset n a partcular stock and holds t untl the end of the tradng perod wth the hope to make a proft through the apprecaton of the stock.
6 6.. 8. 6 A. CHEUNG KONG (1.HK) 6 6 8 18. 16. 14. 12. 7 6. 8 6. B. WHARF HOLDINGS (4.HK) 6 6 6 3. 2. 15.... C. CATHAY PAC AIR (293.HK) 6 6 7 6 3. 2. 15.... D. TOYOTA (61.JP) 6 4. 45 3. 18. 16. 35 14. 25 2. 12.. 8 15 15. 6 5.... E. SONY (6758.JP) Fgure 1. Comparson between the algorthm wthout socal learnng (SETTING1) and the algorthms wth dfferent pressures on socal learnng (SETTING2, 3, 4). On all the X axes, 1 refers to SETTING1, 2 refers to SETTING2, 3 refers to SETTING3, and 4 refers to SETTING4 (see secton). (a) Number of traders outperformed the bank savngs. (b) Number of traders outperformed the buy and hold strategy. (c) The cumulatve total return of the best trader from the traders. (d) The average cumulatve total return of all traders. 51
Fg. 1, A(a) to E(a) shows the number of traders who outperformed bank savngs under the four dfferent settngs on a partcular stock. A(b) to E(b) shows the number of traders who outperformed the classc buy-and-hold strategy. A(c) to E(c) shows the cumulatve total returns of the best traders under four dfferent settngs for the partcular stock. A(d) to E(d) show the average cumulatve total returns of traders. It s clear from Fg. 1 that the agents take no part n socal learnng,.e., SETTING 1, generally performed poorly when compared to traders where socal learnng was allowed. By lookng at A(c), B(c), C(c), D(c) and E(c), there s always an ndvdual and socal learnng settng that helped traders to fnd better tradng strateges, rewardng the trader wth hgher returns. A reasonable explanaton s that the socal learnng process enabled the traders to escape from non-promsng search spaces whch are lmted by the set of nformaton the trader perceved from the mperfect envronment, and enter other promsng search spaces whch have been well explored by others n the envronment, or explot new set of nformaton whch have not been used by other partcpants n the market. As we dscussed n secton II, because the stock market s a dynamc envronment wth mperfect nformaton, the optmsaton problem n such an envronment s not just to fnd one good soluton at a tme, but to also adapt to the changed market envronment. Socal learnng enhances an agent s adaptablty by enablng agents to learn successful tradng strateges from others who are better adapted by makng use of dfferent nformaton. On the other hand, the results also demonstrate the effectveness of the ntegrated ndvdual and socal learnng algorthm n solvng learnng problems wth mperfect nformaton. Concernng SETTING 2, 3, and 4,.e., scenaros where the socal learnng s appled to dfferent extents n the smulated stock market, by examng B, C, D, E from Fg. 1, we can see SETTING 2, where only the best players are accepted to dstrbute knowledge and strong motvatons are forced on ndvduals to learn from each other, generally recorded better performance cross four crtera a, b, c and d. On the CHEUNG KONG stock, SETTING 4 seems to perform better than SETTING 2 and 3, but SETTING 2 stll recorded the hghest return from the best trader n A(c). Compared wth SETTING 2, SETTING 4 has a moderate pressure on traders n takng part n the socal learnng but gves more ndvduals the opportuntes to dstrbute ther knowledge to the socety. Both SETTING 2 and 4 strengthen the socal learnng n dfferent ways. It s clear that strengthenng the socal learnng mproves the adaptablty of artfcal traders n the mperfect smulated stock market. V. CONCLUSIONS AND FUTURE WORK The ntegrated ndvdual learnng and socal learnng algorthm addresses the queston of searchng n non-statc search spaces n an mperfect envronment where dfferent sets of nformaton are perceved by dfferent ndvduals, whch s generally neglected by other evolutonary approaches n dynamc optmsaton problems. The results from the experments demonstrate that socal learnng plays an mportant role n ensurng the adaptablty of agents n an mperfect dynamc envronment. The ntegrated ndvdual and socal learnng algorthm presents a way of solvng learnng problems wth mperfect nformaton. When two algorthms are compared, the purpose s ether to fnd a better generc algorthm, or to fnd a better algorthm that s more sutable for a certan problem. From the authors pont of vew, we mean the latter. When we compared the ntegrated ndvdual and socal learnng algorthm wth other evolutonary approaches for dynamc optmsaton problems, we stressed the precondton s a dynamc envronment wth mperfect nformaton. There are many dynamc envronment optmsaton problems that come wth perfect nformaton, and can be handled by other evolutonary approaches. For our future work, we ntend to study the mpact of the frequency of ndvdual and socal learnng on the mperfect evolutonary system and the absorpton and dssemnaton of new nformaton from an mperfect envronment. REFERENCES [1] J. Branke, Memory enhanced evolutonary algorthms for changng optmzaton problems, n Proc. of Congress on Evolutonary Computaton (CEC 3), pp. 1875-1882, 1999. [2] J. Branke, Evolutonary approaches to dynamc envronments updated survey, n J. Branke, ed, GECCO Workshop on Evolutonary Algorthms for Dynamc Optmzaton Problems, pp. 27-, 1. [3] M. Rothschld and J. E. Stgltz, Equlbrum n compettve nsurance markets: an essay on the economcs of mperfect nformaton, The Quarterly Journal of Economcs, vol. 9, no. 4, pp. 6-649, 1976. [4] R. Arnott, B. Greenwald, R. Kanbur, and B. Nalebuff, Economcs for an mperfect world essays n honor of Joseph E. Stgltz, The MIT Press, Cambrdge Massachusetts, 3. [5] E. F. Fama, Effcent captal markets: A revew of theory and emprcal work, Journal of Fnance, May: 383-417, 197. [6] G. Kendall and Y. Su, Co-evoluton of successful tradng strateges n a mult-agent based smulated stock market, n Proc. of The 3 Internatonal Conference on Machne Learnng and Applcatons (ICMLA 3), pp. -6, 3. [7] G. Kendall and Y. Su, A mult-agent based smulated stock market testng on dfferent types of stocks, n Proc. of The Congress on Evolutonary Computaton (CEC 3), pp. 2298-25, 3. [8] J. Branke, Evolutonary Optmzaton n Dynamc Envronments, Kluwer, 1. [9] J. Branke, Evolutonary approaches to dynamc optmzaton problems - ntroducton and recent trends, n J. Branke, ed, GECCO Workshop on Evolutonary Algorthms for Dynamc Optmzaton Problems, pp. 2-4, 3. [] A. Bandura, Socal Foundatons of Thought and Acton: A Socal Cogntve Theory, Englewood Clffs, NJ: Prentce Hall, 1986. [11] S. Mneka and M. Cook, Socal learnng and the acquston of snake fear n monkeys, n T. R. Zentall, et al., eds, Socal Learnng: Psychologcal and Bologcal Perspectves. Lawrence Erlbaum Assocates, pp. 51-73, 1988. [12] S. J. Lous and Z. Xu, Genetc algorthms for open shop schedulng and re-schedulng, n M. E. Cohen and D. L. Hudson, eds, ISCA 11 th Intl. Conf. on Computers and Ther Applcatons, pp. 99-2, 1996. [13] N. Mor, S. Imansh, H. Kta, and Y. Nshkawa, Adaptaton to changng envronments by means of the memory based thermodynamcal genetc algorthm, n Proc. of The Seventh Intl. Conf. on Genetc Algorthms, pp. 299-6, 1997. [14] C. N. Bendtsen and T. Krnk, Dynamc memory model for nonstatonary optmzaton. n Proc. of Congress on Evolutonary Computaton, pp. 145-1, 2. [15] J. Branke, T. Kaußler, C. Schmdt, and H. Schmeck, A multpopulaton approach to dynamc optmzaton problems, n Adaptve Computng n Desgn and Manufacturng, Sprnger,. 52