Sock raing wih Recurren Reinforcemen Learning (RRL) CS9 Applicaion Projec Gabriel Molina, SUID 555783
I. INRODUCION One relaively new approach o financial raing is o use machine learning algorihms o preic he rise an fall of asse prices before hey occur. An opimal raer woul buy an asse before he price rises, an sell he asse before is value eclines. or his projec, an asse raer will be implemene using recurren reinforcemen learning (RRL). he algorihm an is parameers are from a paper wrien by Mooy an Saffell. I is a graien ascen algorihm which aemps o maximize a uiliy funcion known as Sharpe s raio. By choosing an opimal parameer w for he raer, we aemp o ake avanage of asse price changes. es examples of he asse raer s operaion, boh real-worl an conrive, are illusrae in he final secion. III. UILIY UNCION: SHARPE S RAIO One commonly use meric in financial engineering is Sharpe s raio. or a ime series of invesmen reurns, Sharpe s raio can be calculae as: Average( R ) S for inerval,..., Sanar Deviaion( R ) where R is he reurn on invesmen for raing perio. Inuiively, Sharpe s raio rewars invesmen sraegies ha rely on less volaile rens o make a profi. IV. RADER UNCION he raer will aemp o maximize Sharpe s raio for a given price ime series. or his projec, he raer funcion akes he form of a neuron: anh( w x ) where M is he number of ime series inpus o he raer, he parameer vecor x, r,..., r M,, an he reurn r p p. w M, he inpu Noe ha r is he ifference in value of he asse beween he curren perio an he previous perio. herefore, r is he reurn on one share of he asse bough a ime. Also, he funcion [, ] represens he raing posiion a ime. here are hree ypes of posiions ha can be hel: long, shor, or neural. A long posiion is when. In his case, he raer buys an asse a price p an hopes ha i appreciaes by perio. A shor posiion is when. In his case, he raer sells an asse which i oes no own a price p, wih he expecaion o prouce he shares a perio. If he price a is higher, hen he raer is force o buy a he higher price o fulfill he conrac. If he price a is lower, hen he raer has mae a profi. J Mooy, M Saffell, Learning o rae via Direc Reinforcemen, IEEE ransacions on Neural Neworks, Vol, No 4, July.
A neural posiion is when. In his case, he oucome a ime has no effec on he raer s profis. here will be neiher gain nor loss. hus, represens holings a perio. ha is, n shares are bough (long posiion) or sol (shor posiion), where is he maximum possible number of shares per ransacion. he reurn a ime, consiering he ecision, is: R r where is he cos for a ransacion a perio. If (i.e. no change in our invesmen his perio) hen here will be no ransacion penaly. Oherwise he penaly is proporional o he ifference in shares hel. he firs erm ( r ) is he reurn resuling from he invesmen ecision from he perio. or example, if shares, he ecision was o buy half he maximum allowe (. 5 ), an each share increase r 8 price unis, his erm woul be 8, he oal reurn profi (ignoring ransacion penalies incurre uring perio ). V. GRADIEN ASCEN Maximizing Sharpe s raio requires a graien ascen. irs, we efine our uiliy funcion using basic formulas from saisics for mean an variance: We have S E[ R ] A where A E[ R ] ( E[ R ]) B A R an B R hen we can ake he erivaive of S using he chain rule: S S A B A A A S B S A B A S B B S A A S B B he necessary parial erivaives of he reurn funcion are: r sgn( ) r r r sgn( ) hen, he parial erivaives an mus be calculae:
anh( w x ) ( anh( w x ) ) w x ( anh( w ) x M x ) w 3 Noe ha he erivaive is recurren an epens on all previous values of. his means ha o rain he parameers, we mus keep a recor of from he beginning of our ime series. Because sock aa is in he range of - samples, his slows own he graien ascen bu oes no presen an insurmounable compuaional buren. An alernaive is o use online learning an o approximae using only he previous erm, effecively making he algorihm a sochasic graien ascen as in Mooy & Saffell s paper. However, my chosen approach is o insea use he exac expressions as wrien above. Once he S erm has been calculae, he weighs are upae accoring o he graien ascen rule wi wi S. he process is repeae for N e ieraions, where N e is chosen o assure ha Sharpe s raio has converge. VI. RAINING he mos successful meho in my exploraion has been he following algorihm:. rain parameers w M using a hisorical winow of size. Use he opimal policy w o make real ime ecisions from o N preic 3. Afer N preic preicions are complee, repea sep one. Inuiively, he sock price has unerlying srucure ha is changing as a funcion of ime. Choosing large assumes he sock price s srucure oes no change much uring samples. In he ranom process example below, an N are large because he srucure of he process is consan. If long erm rens o no appear o preic ominae sock behavior, hen i makes sense o reuce, since shorer winows can be a beer soluion han raining on large amouns of pas hisory. or example, aa for he years IBM 98-6 migh no lea o a goo sraegy for use in Dec. 6. A more accurae policy woul likely resul from raining wih aa from 4-6. VII. EXAMPLE price, p().95 3 4 5 6 7 8 9 Sharpe' raio.6.4...8.6 3 4 5 6 7 raining ieraion igure. raining resuls for auoregressive ranom process., 75 N e he firs example of raining a policy is execue on an auoregressive ranom process (ranomness by injecing Gaussian noise ino couple equaions). In figure, he op graph is he generae price series. he boom graph is Sharpe s raio on he ime series using he parameer w for each ieraion of raining. So, as raining progresses, we fin beer values of w unil we have achieve an opimum Sharpe s raio for he given aa.
hen, we use his opimal w parameer o form a preicion for he nex N preic aa samples, shown below: 4 igure. Preicion performance using opimal policy from raining. N preic As is apparen from he above graph, he raer is making ecisions base on he w parameer. Of course, w is subopimal for he ime series over his preice inerval, bu i oes beer han a monkey. Afer inervals our reurn woul be %. he nex experimen, presene in he same forma, is o preic real sock aa wih some precipious rops (Ciigroup): price series, p 6 4 3 4 5 6 Sharpe's raio..5 3 4 5 6 7 8 9 raining ieraion, N e igure 3. raining w on Ciigroup sock aa. 6
5 5 reurns, r -5 - -5 6 65 7 75 8 85 9.5 (ecisions) -.5-6 65 7 75 8 85 9 3 percen gains (%) 6 65 7 75 8 85 9 igure 4. r (op), (mile), an percenage profi (cumulaive) for Ciigroup. Noe ha alhough he general r ) wipes ou our gains aroun = 75. policy is goo, he precipious rop in price (ownwar spike in he recurren reinforcemen learner seems o work bes on socks ha are consan on average, ye flucuae up an own. In such a case, here is less worry abou a precipious rop like in he above example. Wih a relaively consan mean sock price, he reinforcemen learner is free o play he ups an owns. he recurren reinforcemen learner seems o work, alhough i is ricky o se up an verify. One imporan rick is o properly scale he reurn series aa o mean zero an variance one, or he neuron canno separae he resuling aa poins. VII. CONCLUSIONS he primary ifficulies wih his approach res in he fac ha cerain sock evens o no exhibi srucure. As seen in he secon example above, he reinforcemen learner oes no preic precipious rops in he sock price an is jus as vulnerable as a human. Perhaps i woul be more effecive if combine wih a mechanism o preic such precipious rops. Oher changes o he moel migh be incluing sock volumes as feaures ha coul help in preicing rises an falls. Aiionally, i woul be nice o augmen he moel o incorporae fixe ransacion coss, as well as less frequen ransacions. or example, a moel coul be creae ha learns from long perios of aa, bu only perioically makes a ecision. his woul reflec he case of a casual raer ha paricipaes in smaller volume raes wih fixe ransacion coss. Because i is oo expensive for small-ime invesors o rae every perio wih fixe ransacion coss, a moel wih a perioic rae sraegy woul more financially feasible for such users. I woul probably be worhwhile o ry aaping his moel o his sor of perioic raing an see he resuls. Gol, Carl, X raing via Recurren Reinforcemen Learning, Compuaional Inelligences for inancial Engineering, 3. Proceeings. 3 IEEE Inernaional Conference on. p. 363-37. March 3. Special hanks o Carl for email avice on algorihm implemenaion.