Modeling POMDPs for Generating and Simulating Stock Investment Policies

Modeling POMDPs for Generting nd Simulting Stock Investment Policies Augusto Cesr Espíndol Bff UNIRIO - Dep. Informátic Aplicd Av. Psteur, 458 - Térreo Rio de Jneiro - Brzil ugusto.bff@uniriotec.br Angelo E. M. Cirlini UNIRIO - Dep. Informátic Aplicd Av. Psteur, 458 - Térreo Rio de Jneiro - Brzil ngelo.cirlini@uniriotec.br ABSTRACT Anlysts nd investors use Technicl Anlysis tools to crete chrts nd price indictors tht help them in decision mking. Chrt ptterns nd indictors re not deterministic nd even nlysts my hve different interprettions, depending on their experience, bckground nd emotionl stte. In this wy, tools tht llow users to formlize these concepts nd study investment policies bsed on them cn provide more solid bsis for decision mking. In this pper, we present tool we hve built to formlly model stock investment contexts s Prtilly Observble Mrkov Decision Processes (POMDP), so tht investment policies in the stock mrket cn be generted nd simulted, tking into considertion the ccurcy of Technicl Anlysis techniques. In our models, we ssume tht the trend for the future prices is prt of the stte t certin time nd cn be prtilly observed by mens of Technicl Anlysis techniques. Historicl series re used to provide probbilities relted to the ccurcy of Technicl Anlysis techniques, which re used by n utomted plnning lgorithm to crete policies tht try to mximize the profit. The tool lso provides flexibility for trying nd compring different models. Ctegories nd Subject Descriptors I.2.8 [Control Methods]: Pln execution, formtion, genertion. D.4.8 [Performnce] Mesurements, Modeling nd prediction, Monitors, Simultion. Generl Terms Algorithms, Design, Economics nd Experimenttion Keywords POMDP, Simultion, Stock Mrket, Technicl Anlysis Permission to mke digitl or hrd copies of ll or prt of this work for personl or clssroom use is grnted without fee provided tht copies re not mde or distributed for profit or commercil dvntge nd tht copies ber this notice nd the full cittion on the first pge. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission nd/or fee. SAC 10, Mrch 22-26, 2010, Sierre, Switzerlnd. Copyright 2010 ACM 978-1-60558-638-0/10/03 $10.00 1. INTRODUCTION Stock Mrket Anlysis cn be seen s nondeterministic nd prtilly observble problem becuse there is no wy to be sure bout the results of certin decision. Even experienced nlysts re not ble to predict ll fctors tht might ffect stock prices. In Technicl Anlysis techniques [9], it is ssumed Dow s hypothesis [8], which sttes tht time series re the only relevnt source of informtion to estimte the best time to buy or sell stocks. Technicl Anlysis provides mny sttisticl methods to study price series such s chrt ptterns, indictors nd other concepts, but there re severl wys to interpret dt. Investors cn be either in long positions, in which they bet tht prices of certin stock will increse or in short positions, in which they bet tht prices will fll. Long positions cn be ssumed by simply buying ssets, expecting to sell them t higher prices. Short positions cn be ssumed by borrowing ssets nd selling them with the intention of buying them bck lter t lower prices. Idelly, Technicl Anlysis techniques could indicte when the investor should ssume either long or short position. The problem is tht Technicl Anlysis techniques re neither deterministic nor perfect. A tool for modeling, testing nd combining the concepts should help in decision mking, considering the risks nd potentil benefits of certin decision. Pst decisions cn influence future results nd decisions. In this wy, the estimtion of bull or ber periods is not enough to mke the right decision t certin time. Buying fter bull mrket, for exmple, should consider the chnces of continued bull mrket to increse profits. It is then necessry to dopt policies tht tke into considertion risks nd benefits of combining decisions in the long term. In this pper, we describe tool we hve developed to support modeling context of investment in the stock mrket s Prtilly Observble Mrkov Decision Process (POMDP) [3], so tht investment policies cn be utomticlly generted, simulted nd evluted. The tool llows the user to formlly model wht composes stte t certin time nd to formlly specify Technicl Anlysis sensors tht indicte the occurrence of ptterns in time series. In order to model the process s POMDP, we hve to ssume Mrkov hypothesis, estblishing tht the next stte depends only on the current stte. In this wy, we try to encpsulte within stte ll informtion tht could be relevnt for decision mking. A stte contins dt tht cn be directly observed such s the

investor s position nd price dt from the pst (up to the point tht we consider useful). Additionlly, we ssume tht there is lwys trend for the future prices. In order to incorporte this ide, sttes lso contin informtion bout prices in the ner future. This prt of the stte cnnot be directly observed, but we cn try to prtilly observe it vi our Technicl Anlysis sensors. Bsed on the forml specifictions for sttes nd sensors, time series re investigted to estblish probbilities of chnging from one stte to nother nd of observing Technicl Anlysis pttern t certin stte. Probbilities cn lso be rtificilly specified by experienced nlysts nd compred to occurrences in the time series. Hving probbilistic model for our POMDP, the tool uses utomted plnning techniques [16] to generte policies. These policies define which ctions should be executed ccording to the observtions obtined from the environment with the sensors. The sme sensors tht were previously used to lern the policy provide observtions to llow the selection of the corresponding ction. Policies re evluted in different mrket scenes (bull, ber, flt). A simultor is used to evlute policies using either rel time mrket informtion or mrket series from dtbse. The tool llows the user, to test vrious Technicl Anlysis concepts nd check their relibility in different scenrios. Relted works on Artificil Intelligence reserch hve lso imed to provide support for decision-mking in stock mrkets. Zhedim nd Chong [23] showed how to define sttes to mnge portfolios nd proposed model to djust the quntity of shres bsed on benchmrk. Elder [7] proposed reinforcement lerning pproch. Lin, Co, Wng nd Zhng [14] proposed dt mining pproch using genetic lgorithms to nlyze shres. Egeli, Ozturn nd Bdur [6], Lezos nd Tull [13], nd Dvey, Hunt nd Frnk [5] showed different techniques of prediction nd forecst using neurl networks. The min difference of our pproch from previous work is tht we try to crete pltform for modeling lterntive POMDPs incorporting different Technicl Anlysis techniques nd bsed on historic dt. Policies re then generted nd simulted tking into considertion the efficcy of ech technique to predict price trends, so tht the best techniques for ech period nd sset cn be chosen. 2. POMDP A Prtilly Observble Mrkov Decision Process (POMDP) is frmework to describe process in which n gent hs to execute ctions without being ble to directly observe the underlying stte. The gent hs then to reson bsed on some locl observtions nd on probbility distributions over the sttes of the world being modeled. The set of probbilities of being in specific sttes t certin time is clled belief. The domin should be modeled s stochstic system with nondeterministic stte trnsitions cused by ctions. Trnsitions from sttes by mens of ctions re lso ssocited with specific probbilities. More formlly, POMDP is represented by the tuple = ( S, A, O, T, P, C, R) where: S is finite set of sttes, A is finite set of ctions nd O is finite set of observtions. T ( s' is probbility distribution. For ny A, s S nd s' S, ( s' is the probbility of reching stte s' T fter executing ction in s. If, for ny stte s nd ction, T ( s' 0, then T ( s' = 1. s' S P ( o is probbility distribution. For ny A, s S nd o O, P ( o is the probbility of observing o in stte s fter executing. If, for ny ction nd observtion o, P ( o 0, then P ( o = 1. o O C : S A R is function tht ssigns cost to ech stte nd ction. R : S R is function tht ssigns ech stte to rewrd. As the current stte is unknown, it is necessry to reson bsed on belief, tht is, distribution probbility of being t ech stte. Let us cll the set of ll possible beliefs B. If b B is belief stte, the probbility of being in stte s is denoted by b (. The function b ( clcultes the probbility of reching stte s by executing ction from the current belief stte b. b ( = s' S T ( s s' ) b( s' ) The function b (o) clcultes the probbility of observing o, fter hving executed ction, from the current belief stte b. b ( o) = P ( o b ( s S So, given belief stte b, the execution of n ction results in new belief stte defined by b o fter hving observed o, s described by the expression: o P ( o b ( b ( = b ( o) The rewrd for belief stte b when ction is executed is clculted by the function: ρ ( b, ) = b( ( R( C( s, )) s S A solution for POMDP is policy π : B A tht mps belief sttes into ctions. A discount fctor γ is normlly used to generte policies tht give more importnce to recent benefits nd costs. An optiml policy is policy tht mximizes the possibility of future benefits, determined by the difference between rewrds nd costs. The POMDP plnning problem is clssified s n optimiztion problem. Its objective is the genertion of policy tht determines, for ech belief stte b, the best ction to mximize the expected benefit E(b) s described by the function: A o E ( b) = mx{ ρ ( b, ) + γ b ( o) E( b )} o O A possible method for solving POMDP problems corresponds to lgorithms tht convert POMDPs to completely observble Mrkov Decision Processes (MDP bsed on belief sttes insted of domin sttes. The resulting MDP is then solved by mens of lgorithms like Policy Itertion [19] or Vlue Itertion [2]. However, s the set of belief sttes is usully infinite nd continuous, POMDPs re computtionlly very hrd to solve. Algorithms tht return optiml policies work only with very smll

problems. Algorithms tht generte pproximtions re then usully dopted. 3. ARCHITECTURE AND METHODOLOGY In order to study policies for investment in the stock mrket, we propose tool in which the user cn specify POMDP models. Probbilities re then collected from pst dt (usully from long time intervl) nd policy is obtined by mens of n lgorithm tht genertes pproximte optiml solutions. Finlly, the efficcy of the model nd the corresponding generted policy is evluted by mens of simulting the doption of the policy to other time intervls. All the process is bsed on the ide tht, t ny time, there is trend for the prices of certin sset. In this wy, stte corresponding to certin time cn be composed of recent dt, the current investor s position nd the current trend (which corresponds to prices in the ner future). We ssume tht we might not know the trend, but it is lwys confirmed nd cn be prtilly observed by mens of Technicl Anlysis sensors. In this wy, prices in the ner future re prt of the underlying stte t ech time. At rel-time we never know future prices, but, when we exmine the historic, prices fter certin time in the pst re known nd cn be compred to Technicl Anlysis indictions t tht time. Probbilities relting indictions to future prices cn then be collected. When we model the problem s POMDP, we ssume tht Technicl Anlysis concepts might not be perfect when compred to rel dt, which is modeled by the collected probbilities. By using these probbilities, we re ble to crete policies tht try to combine ctions in order to increse profits. The influence of pst dt on future prices my vry nd the best interprettion for Technicl Anlysis concepts my lso vry. In this wy, our tool tries to provide flexibility both for modeling the composition of stte nd for creting nd using Technicl Anlysis sensors. Vrious POMDPs cn then be specified nd solved in prllel, generting vrious lterntive policies tht cn be evluted before being used to support decision mking t reltime. As the solution generted by plnners tht solve POMDPs is usully n pproximtion of the optiml solution, we lso try to provide flexibility for experimenting with different POMDP plnners. Figure 1 presents the overll rchitecture of the tool. The Context Controller module llows the user to crete POMDP model, by defining sttes, ctions, trnsitions nd the wy observtions re to be produced by sensors. Bsed on this model, specific modules re ctivted by the Context Controller to solve the POMDP. Besides mrket dt, sttes contin informtion relted to the investment position (long, short or off). This is the prt of the stte tht is directly modified by ctions. As mentioned erlier, ctions hve costs nd sttes hve rewrds. The model should be specified in such wy tht sttes tht generte profits re ssigned to positive rewrds nd those tht cuse losses re ssigned to negtive rewrds. The Probbility Genertor module nlyzes time series stored in the dtbse to ssign probbilities to trnsitions between sttes nd to observtions t ech stte. It is usully nlyzed long time intervl for estimting the probbilities, which should not overlp with time intervls used for evluting policies. Probbilities re obtined by counting the number of times certin stte trnsition occurred nd the number of times n observtion is verified t certin stte. Probbilities cn lso be djusted or inserted mnully by users, ccording to their experience. User Time Series Stock Mrket Test Series Context Controller Probbility Genertor Observtion Genertion Sensor 1 Sensor x Mrket Simultor Plnner Context: -Sttes -Actions -Trnsitions -Observtion Model -Probbilities -Policies Agent User Figure 1- Overll Architecture The Plnner module is responsible for the utomted plnning process tht genertes policies to be stored in the dtbse nd used in future simultions nd t rel-time. Different lgorithms cn be used to solve POMDPs ccording to the user's preference. Observtions re generted by mens of sensors tht implement the Technicl Anlysis concepts. Sensors trnslte mrket dt to discrete observtions, such s estimtes of high, bull mrket, low, ber mrket, flt mrket, oscilltion nd trend reversl, ccording to the model provided by the user. Observtions cn be generted by single indictor or chrt pttern or by combintion. The Observtion Genertor module produces observtions in ccordnce with user definitions nd dt provided by the Mrket module. Observtions re used during the genertion of probbilities nd for pplying policies either in simultions or t rel-time for decision support. The Mrket module is responsible for cquiring dt from broker or stock exchnge. It cn lso simulte the stock mrket informtion using historicl dt provided by the Simultor module. The Agent module is responsible for executing the policies in ccordnce with observtions provided by the Observtion Genertor. When n observtion is informed, the corresponding ction is selected nd new belief stte is reched. Actions cn be informed to the user t rel-time s suggestion. They cn lso be sent to the Simultor module, which simultes the execution of ctions nd evlutes performnces. Either rel-time mrket dt or test time-series cn be used in this simultion. Reports bout the qulity of the policies re utomticlly generted. The bsic indictors tht cn be used in the sensors re: Moving Averges, Moving Averge Oscilltors nd [15], MACD (Moving

Averge Convergence/Divergence) [1], DUAL CCI (Commodity Chnnel Index) [11], Momentum nd ROC (Rte of Chnge) [9], Prbolic SAR nd RSI (Reltive Strength Index) [22] nd Stochstic Oscilltor (SO)[12]. Cndlesticks models [17], pivot supports nd resistnces [9] cn lso be used to produce observtions. The tool hs incorported open-source plnners for solving POMDPs. We hve resorted so fr to the lgorithms implemented by Cssndr [4], which re vilble t http://www.pomdp.org. As the computtionl complexity of the problem is high, the execution of these lgorithms for obtining pproximte solutions usully tkes hours. We re currently investigting the incorportion of other more recent lgorithms, such s those defined in [10, 20, 21]. In ddition, Grid Computing Frmework ws implemented to do vrious plnning jobs in prllel. This rchitecture enbles the tool to process different ssets nd/or test different models simultneously. During our tests, 4 PCs hve been used, ech one ws ble to process 2 jobs in prllel. The verge time for obtining policy for the investment model described in the next section hs been 8 hours. 4. BASIC INVESTMENT MODEL In our pproch, we do not propose single POMDP model. The right model might depend on the specific stocks nd on the time period it is pplied to. We tried insted to provide mens to crete nd evlute models. Nevertheless, in order to test our rchitecture nd methodology, we proposed n initil bsic model. Given the complexity of POMDP plnning, the model must hve few sttes, otherwise the utomtic genertion of investment policies might not be trctble. We decided to initilly work only on the spot mrket, nlyzing single stock or index. In this wy, we proposed stte composed of only three vribles: () mrket trend (bull, ber or flt) in the previous 6 dys; (b) mrket trend in the following 6 dys; nd (c) investor s position (long, short or off). As ech vrible cn ssume 3 different vlues, we hve 27 different sttes. Ech trend is defined by rnge of price vrition. Rnges cn be djusted by the user. For 6-dy period, we ssumed the following trends in ccordnce with price vrition: bull mrket for vrition 1%, ber mrket for vrition -1% nd flt mrket for vrition in between. Investors cn be either in short or long positions towrds the stock being nlyzed or cn be off the mrket. In long position, the investor will hve bought stock shres corresponding to fixed mount of money M. In short position, the investor will hve borrowed nd immeditely sold stock shres corresponding to the mount of money M. In long positions, profit is proportionl the stock price vrition nd in short positions it is proportionl to the symmetric of price vrition. For the ske of simplicity, we neither consider fees nor the combintion with the purchse of cll options to limit losses in short positions. We ssume tht profits nd losses re ccumulted when the investor clers one position nd ssumes nother, so tht the mount of money invested is lwys M, tht is, profits re not reinvested nd the investor lwys hve money to invest the mount M, even if he or she hd previous losses. The possible ctions re Buy, Sell, DoubleBuy, DoubleSell nd Nothing. Buy nd Sell ctions correspond, respectively, to buying nd selling n mount of money M in stock shres. DoubleBuy nd DoubleSell ctions correspond to n mount 2M nd re used to directly chnge the investor s position from short to long nd vice-vers. Action Nothing corresponds to keeping the investor s position. The ssignment of rewrds to sttes is essentil to specify how profitble ech stte is. It is highly desirble to be in long position in bull mrket. In contrst, in ber mrket, we should try to ssume ber position. When the mrket is flt, it might be preferble to sty off the stock mrket, voiding clering costs nd loss of other investment opportunities. As shown in Tble 1, we ssigned high positive rewrds to sttes where the investor is certinly in the most profitble position nd high negtive rewrds to sttes in which the investor is in position tht certinly cuses losses. When trends re not so cler, smller rewrds were ssigned. Rewrds when the investor is off the mrket re close to zero, being slightly positive if the mrket is relly flt nd slightly negtive when there is cler trend tht the investor could hve tken dvntge. The initil ssignment shown in Tble 1 ws used in the ppliction of the model described in Section 6, but they re not fixed nd cn be chnged in order to study models tht better correspond to given context. For the ske of simplicity, we lso decided not to ssign costs to the ctions, but they cn be ssigned typiclly to tke into considertion tht fees hve to be pid to brokers when stocks re bought nd sold. Tble 1 Rewrds for sttes bsed on triple <previous 6-dy trend, following 6-dy trend, investor s position> bull, bull, long 10 bull, bull, short -10 bull, bull, off -2 bull, flt, long 5 bull, flt, short -2 bull, flt, off 0 bull, ber, long 0 bull, ber, short 5 bull, ber, off 0 flt, bull, long 5 flt, bull, short -5 flt, bull, off 0 flt, flt, long 0 flt, flt, short 0 flt, flt, off 2 flt, ber, long -5 flt, ber, short 5 flt, ber, off 0 ber, bull, long 5 ber, bull, short 0 ber, bull, off 0 ber, flt, long -2 ber, flt, short 5 ber, flt, off 0 ber, ber, long -10 ber, ber, short 10 ber, ber, off -2 There re some trnsition restrictions due to the investor s position. Actions Buy or Sell cn be executed only if the investor s position is not lredy long or short, respectively. On the other hnd, ctions DoubleBuy nd DoubleSell cn be executed only if the investor s position is short or long, respectively. The set of sttes tht cn be reched fter n ction is lso limited. After n ction DoubleSell, for instnce, the investor s position is certinly short. This mens tht from one stte, other 9 distinct sttes cn be reched, vrying only on the trends for the previous nd following dys. As lredy mentioned, the probbility for ech trnsition is estimted bsed on the historic of stock prices. The model described in this section ws simple but enbled us to lredy obtin interesting results s shown in the next section. More sophisticted models cn however be pplied to study the importnce of vrious vribles for decision mking. Among them, we cn point out dily volumes, correltions between different stocks nd indexes, nd correltions between price vritions in long-term nd short-term intervls.

5. APPLYING THE BASIC MODEL The experiments were undertken using Cssndr s POMDP Plnner. We used the Finite Grid POMDP lgorithm (instnce of PBVI) [18]. Two periods from Bovesp 1 Index (IBOV) were selected. The first one, from Jn. 2 nd., 2000 to Dec. 30 th, 2007 ws used during the sttisticl lerning to estimte probbilities. During this period, mny importnt events impcted on the mrket, such s the dot-com bubble, Sept. 11 th. terrorist ttcks nd the Brzilin presidentil election crisis in 2002. The other period, from Jn. 2 nd, to Jun, 30 th, ws chosen for the simultion. It ws selected bsed on the fct tht 3 different trends occurred. In the first semester of there ws flt trend. In the second semester of, the subprime crisis occurred nd there ws berish trend. In the first semester of there ws crisis recovery ( bullish period gin). We modeled the sttes of our POMDP s described in Section 5. In this evlution, we chose to experiment with different sensors, ech one bsed on different Technicl Anlysis indictor. In this wy, we intended to compre results for different indictors nd to compre the doption of the generted policies with the immedite obedience to the clssicl interprettion of ech indictor. We performed simultions bsed on the following indictors: two versions of the Moving Averge Convergence/Divergence indictor (MACD1 nd MACD2), two versions of the Reltive Strength Index (RSI1 nd RSI2) nd two versions of the Stochstic Oscilltor (SO1 nd SO2). Different versions of sme indictor vry bsed on different strtegies of interprettion. Simultions generted policies with number of belief sttes close to 1000, ech one estblishing probbility rnges for ech underlying stte. The policy mps ech belief stte nd observtion to n ction nd indictes the new belief stte tht will be reched fter executing the ction. During the simultion, observtions re continuously provided for ech dy nd the execution of the ction determined by the policy is simulted. Whenever long or short opertion is closed, the corresponding result is ccumulted. If, on the lst dy, there is n open position the simultion forces its closure. The results obtined for ech indictor s well s the vrition of the Bovesp Index (IBOV) re described in Tble 2. All indictors, except SO2, performed better thn IBOV, but some indictors did much better thn others. In prticulr, in simultions bsed on RSI2 nd SO1 we were ble to mke profits even during the crisis. Opertions in which the investor ssumed short position were importnt to minimize losses nd even mke profits during the crisis. Opertions in which the investor ssumed long positions showed to be importnt in prticulr in the recovery period fter the crisis. Tbles 3 nd 4 present the results of long nd short opertions, respectively. It is interesting to note tht, for ll indictors, the results obtined by using our POMDP model were much better thn simply immeditely following the indictions of the sensors, s demonstrted in Tble 5. 1 Bovesp (Bols de Vlores de São Pulo) is the min Brzilin stock exchnge which is bsed in São Pulo city. Indictors Tble 2 Plnner Results x Ibovesp Index Jn-Jun Totl RSI1 2.06% -10.50% 55.72% 47.27% RSI2 14.57% 14.39% 38.88% 67.84% MACD1 20.80% 0.00% 4.38% 25.18% MACD2 19.77% -59.46% 28.72% -10.97% SO1 10.06% 28.78% 51.83% 90.67% SO2 2.14% -84.37% 55.44% -26.79% IBOV -1.35% -36.88% 39.69% -10.80% Indictors Tble 3 Results for Long Opertions Jn-Jun Totl RSI1 1.19% -30.47% 52.22% 22.94% RSI2 9.22% -19.96% 42.56% 31.82% MACD1 15.64% 0.00% 2.80% 18.44% MACD2 19.61% -63.79% 37.50% -6.69% SO1 4.42% -11.32% 49.52% 42.62% SO2-0.03% -64.39% 52.44% -11.98% Indictors Tble 4 Results for Short Opertions Jn-Jun Totl RSI1 0.87% 19.96% 3.50% 24.33% RSI2 5.35% 34.34% -3.68% 36.02% MACD1 5.16% 0.00% 1.58% 6.74% MACD2 0.16% 4.33% -8.78% -4.29% SO1 5.64% 40.10% 2.31% 48.05% SO2 2.17% -19.98% 3.00% -14.81% Tble 5 Results for immedite obedience to indictors Indictors Jn-Jun Totl RSI1 1.67% 0.00% -5.93% -4.26% RSI2 14.01% -28.14% 25.38% 11.25% MACD1 1.98% 0.00% -6.36% -4.38% MACD2 14.71% -65.53% 11.80% -33.13% SO1-7.37% -69.06% 5.98% -70.45% SO2 4.67% -86.77% 0.00% -82.10% 6. CONCLUDING REMARKS In this pper, we proposed n pproch to generte nd simulte stock investment policies. The pproch is bsed on modeling investment contexts s POMDPs nd using Technicl Anlysis to provide observtions bout mrket trends. We hve implemented prototype tool tht llows users to study vrious models nd Technicl Anlysis concepts. We hve lso run the tool with bsic investment model tht we proposed to test our pproch nd obtined some interesting results. The combined use of Technicl Anlysis with POMDPs hs produced much better results thn simply following Technicl

Anlysis indictors. Results hve confirmed so fr our hypothesis tht POMDPs cn enhnce profitbility, becuse indictors re not perfect. In our pproch, we combine ctions tking into ccount the probbilities they hve shown of identifying mrket trends. Our pproch hs lso shown to be useful to compre Technicl Anlysis concepts, so tht n investor cn choose the one tht seems to produce better results for certin stock. In the continution of this project, we intend to pply the tool to other stocks nd indexes nd check whether other models for the sttes of our POMDP cn produce better results. It deserves specil ttention the combintion of long-term nd short-term trends for composing stte. Other Technicl Anlysis indictors nd grphicl ptterns s well s their combintion to crete sensors re lso still to be investigted. In ddition, we intend to study how our tool cn be dpted to cope with more complex investment contexts, such s those in which the user hs limited resources, wnts to limit risks or wnts to combine investments in vrious stocks nd options. 7. REFERENCES [1] Appel, G. The Moving Averge Convergence-Divergence Trding Method( Advnced Version). Trders Pr, 1985 [2] Bellmn, R. Dynmic Progrmming. Princeton University Press, 1957 [3] Cssndr, A. R., Kelbling, L. P., nd Littmn, M. L. Acting optimlly in prtilly observble stochstic domins. In Proceedings of the Twelfth Ntionl Conference on Artificil Intelligence, (AAAI) Settle, WA, 1994 [4] Cssndr, A.R. Exct nd Approximte Algorithms for Prtilly Observble Mrkov Decision Processes. Ph.D. Thesis. Brown University, Deprtment of Computer Science, Providence, RI, 1998 [5] Dvey, N., Hunt, S.P. nd Frnk, R.J. Time Series Prediction nd Neurl Networks, Kluwer Acdemic Publishers Journl of Intelligent nd Robotic Systems rchive, Volume 31: My /July,2001 [6] Egeli, B., Ozturn, M. nd Bdur, B. Stock Mrket Prediction Using Artificil Neurl Networks, Hwii Interntionl Conference on Business, Honolulu, Hwii, USA, 2003 [7] Elder, T. Creting Algorithmic Trders with Hierrchicl Reinforcement Lerning, MSc. Disserttion, School of Informtics, University of Edinburgh, [8] Hmilton, W. The Stock Mrket Brometer: A Study of its Forecst Vlue Bsed on Chrles H. Dow s Theory of the Price Movement.. New York, NY: John Wiley & Sons,Inc., 1998 (reprint of 1922 edition) [9] Kirkptrick, C. nd Dhlquist, J. Technicl Anlysis: the Complete Resource for Finncil Mrket Technicins. First. FT Press, 2006 [10] Kurniwti, H., Hsu, D., nd Lee, W.S. SARSOP: Efficient point-bsed POMDP plnning by pproximting optimlly rechble belief spces. In Proc. Robotics: Science nd Systems, [11] Lmbert, D. Commodity chnnel index: Tool for trding cyclic trends, Technicl Anlysis of Stocks & Commodities, Volume 1: July/August, 1983 [12] Lne, G. C.. Lne s stochstics: the ultimte oscilltor. Journl of Technicl Anlysis, My, 1985 [13] Lezos, G., nd Tull, M. Neurl network & fuzzy logic techniques for time series forecsting. Conference on Computtionl Intelligence for Finncil Engineering, New York, NY, 1999 [14] Lin, L., Co, L., Wng, J. nd Zhng, C. The Applictions of Genetic Algorithms in Stock Mrket Dt Mining Optimistion, Fifth Interntionl Conference on Dt Mining, Text Mining nd their Business Applictions, Mlg, Spin, 2004 [15] Murphy, J.J. Technicl Anlysis of the Finncil Mrkets: A Comprehensive Guide to Trding Methods nd Applictions. Prentice Hll Press, 1999 [16] Nu, D., Ghllb, M., nd Trverso, P. Automted Plnning: Theory & Prctice. Morgn Kufmnn Publishers Inc., 2004 [17] Nison, S. Jpnese Cndlestick Chrting Techniques, Second Edition. Prentice Hll Press., 2001 [18] Pineu, J., Gordon, G., nd Thrun, S.. Point-bsed vlue itertion: An nytime lgorithm for POMDPs. In Proc. Int. Joint Conf. on Artificil Intelligence, Acpulco, Mexico, 2003 [19] Putermn, M.L. Mrkov Decision Processes : DiscreteStochstic Dynmic Progrmming. New York : John Wiley & Sons, 1994 [20] Smith, T. nd Simmons, R. Heuristic serch vlue itertion for POMDPs. In Proceedings of the 20th Conference on Uncertinty in Artificil intelligence (Bnff, Cnd, July 07-11, 2004). ACM Interntionl Conference Proceeding Series, vol. 70. AUAI Press, Arlington, Virgini, 520-527, 2004 [21] Smith, T., Thompsom,D.R., nd Wettergreen,D.S. Generting exponentilly smller POMDP models using conditionlly irrelevnt vrible bstrction. In Proc. Int. Conf. on Applied Plnning nd Scheduling (ICAPS), 2007 [22] Wilder, J.W. New Concepts in Technicl Trding Systems. Greensboro, SC: Trend Reserch, 1978 [23] Zhedim, R. nd Chong, E., Portfolio Mngement using Prtilly Observble Mrkov Decision Processes, Colordo Stte University Informtion Science nd Technology Reserch Colloquium, 2005