Reprnt from: BP Workng Paper n Fnancal Economcs Seres (3) September 997 A Tradng Sstem for FTSE-00 Futures Usng eural etworks and Wavelets D L Toulson S P Toulson Intellgent Fnancal Sstems Lmted fs@f5com wwwf5com ABSTRACT In s paper, we shall examne e combned use of e Dscrete Wavelet Transform [7] and regularsed neural networks to predct ntra-da returns of e LIFFE FTSE-00 ndex future The Dscrete Wavelet Transform (DWT) has recentl been used extensvel n a number of sgnal processng applcatons [5, 6] In s work, we shall propose e use of a specalsed neural network archtecture (WEAPO) at ncludes wn t a laer of wavelet neurons These wavelet neurons serve to mplement an ntal wavelet transformaton of e nput sgnal, whch n s case, wll be a set of lagged returns from e FTSE-00 future We derve a learnng rule for e WEAPO archtecture at allows e dlatons and postons of e wavelet nodes to be determned as part of e standard back-propagaton of error algorm Ths ensures at e chld wavelets used n e transform are optmal n terms of provdng e best dscrmnator nformaton for e predcton task We en examne how e predctons obtaned from commttees of WEAPO networks ma be exploted to establsh tradng rules for adoptng postons n e FTSE-00 Index Future usng a Sgnal Thresholded Tradng Sstem (STTS) The STTS operates b combnng predctons of e future return estmates of a fnancal tme seres over a varet of dfferent predcton horzons A set of tradng rules s en determned at act to optmse e rsk adusted performance (Sharpe Rato) of e tradng strateg usng realstc assumptons for bd/ask spread, slppage and transacton costs ITRODUCTIO Over e past decade, e use of neural networks for fnancal and econometrc applcatons has been wdel researched In partcular, neural networks have been appled to e task of provdng forecasts for varous fnancal markets rangng from spot currences to equt ndexes The mpled use of ese forecasts s often to develop sstems to provde proftable tradng recommendatons However, n practce, e success of neural network tradng sstems has been somewhat poor Ths ma be attrbuted to a number of factors In partcular, we can dentf e followng weaknesses n man approaches: Data Pre-processng Inputs to e neural network are often smple lagged returns (or even prces!) The dmenson of s nput nformaton s often much too hgh n e lght of e number of tranng samples lkel to be avalable Technques such as Prncpal Components Analss (PCA) and
Dscrmnant Analss can often help to reduce e dmenson of e nput data [,] In s paper, we present an alternate approach usng e Dscrete Wavelet Transform (DWT) Model Complext eural networks are often traned for fnancal forecastng applcatons wout sutable regularsaton technques Technques such as Baesan Regularsaton [,3,0] or smple weght deca help control e complext of e mappng performed b e neural network and reduce e effect of over-fttng of e tranng data Ths s partcularl mportant n e context of fnancal forecastng due to e hgh level of nose present wn e data 3 Confuson Of Predcton And Tradng Performance Often researchers present results for fnancal forecastng n terms of root mean square predcton error or number of accuratel forecast turnng ponts Whlst ese values contan useful nformaton about e performance of e predctor e do not necessarl mpl at a successful tradng sstem ma be based upon em The performance of a tradng sstem s usuall dependent on e performance of e predctons at ke ponts n e tme seres Ths performance s not usuall adequatel reflected n e overall performance of e predctor averaged over all ponts of a large testng perod We shall present a practcal tradng model n s paper at attempts to address each of ese ponts THE PREDICTIO MODEL In s paper, we shall examne e use of commttees of neural networks to predct future returns of e FTSE-00 Index Future over 5, 30, 60 and 90 mnute predcton horzons We shall en combne ese predctons and determne from em a set of tradng rules at wll optmse rsk adusted performance (Sharpe Rato) We shall use as nput to each of e neural network predctors, e prevous 40 lagged mnutel returns of e FTSE-00 Future The requred output shall be e predcted future return for e approprate predcton horzon Ths process s llustrated n Fgure Predcton Horzon FTSE-00 Tme 40 lagged returns 5 mn 30 mn 60 mn 90 mn Fgure : Predctng FTSE-00 Index Futures: 40 lagged returns are extracted form e FTSE-00 future tme seres These returns are used as nput to (WEAPO) MLPs Dfferent MLPs are traned to predct e return of e FTSE-00 future 5, 30, 60 and 90 mnutes ahead A ke consderaton concernng s tpe of predcton strateg s how to encode e 40 avalable lagged returns as a neural network nput vector One possblt would be to smpl use all 40 raw nputs The problem w s approach s e hgh dmensonalt of e nput vectors Ths wll requre us to use an extremel large set of tranng examples to ensure at e parameters of e model (e weghts of e neural network) ma be properl determned Due to computatonal complextes and e non-statonart of fnancal tme seres, usng extremel large tranng sets s seldom practcal A preferable strateg s to attempt to reduce e dmenson of e nput nformaton
3 A popular approach to reducng e dmenson of nputs to neural networks s to use a Prncpal Components Analss (PCA) transform to reduce nformaton redundanc n e nput vectors due to nter-component correlatons However, as we are workng w lagged returns from a sngle fnancal tme seres we know, n advance, at ere s lttle (auto) correlaton n e lagged returns In oer work [, ], we have approached e problem of dmenson reducton rough e use of Dscrmnant Analss technques These technques were shown to lead to sgnfcantl mproved performance n terms of predcton ablt of e traned networks However, such technques do not, n general, take an advantage of our knowledge of e temporal structure of e nput components, whch n s case wll be sequental lagged returns Such technques are also mplctl lnear n er assumptons of separablt, whch ma not be generall approprate when consderng obtanng an optmal set of nputs to (non-lnear) neural networks We shall consder, as an alternatve means of reducng e dmenson of e nput vectors, e use of e Dscrete Wavelet Transform 3 THE DISCRETE WAVELET TRASFORM (DWT) 07 034 Wavelets 03 Coeffcents Fgure : The dscrete wavelet transform The Dscrete Wavelet Transform [4, 5] has recentl receved much attenton as a technque for e preprocessng of data n applcatons nvolvng bo e compact representaton of e orgnal data (e data compresson or factor analss) or as a dscrmnator bass for pattern recognton and regresson problems The transform functons b proectng e orgnal sgnal onto a sub-space spanned b a set of chld wavelets derved from a partcular Moer wavelet For example, let us select e Moer wavelet to be e Mexcan Hat functon t ( t ) e The wavelet chldren are en dlated and translated forms of (), e 4 ( t) π () 3, ( t) t () ow, let us select a fnte subset C from e nfnte set of possble chld wavelets Let e members of e subset be dentfed b e dscrete values of poston and scale,,, K, e {,,, } C K (3) 3
4 The component of e proecton of e orgnal sgnal x onto e K dmensonal space spanned b e chld wavelets s en x, ( ) (4) The sgnfcant questons to be answered w respect to usng e DWT to reduce e dmenson of e nput vectors to e neural network are frstl how man chld wavelets should be used and gven at, what values of shft and dlaton, and,, should be chosen? In s paper, we shall present a meod of choosng a sutable set of chld wavelets such at e transformaton of e orgnal data (e 40 lagged returns) wll enhance e non-lnear separablt of dfferent classes of sgnal (e future postve and negatve returns) whlst sgnfcantl reducng ts dmenson We show how s ma be acheved naturall b mplementng e wavelet transform as a set of neurons contaned n e frst laer of a mult-laer perceptron 4 THE WAVELET ECODIG A PRIORI ORTHOGOAL ETWORK (WEAPO) The WEAPO archtecture s shown below n Fgure 3 The archtecture s essentall a Mult Laer Perceptron (MLP) w an addtonal laer of specal wavelet nodes Each of ese nodes represents a sngle chld wavelet and ts output response s smpl e proecton of e nput vector onto at partcular wavelet e where n s case, s e output of e, x ( ) (5) wavelet node, x s e and ξ are respectvel e shfts and dlatons assocated w e component of e nput vector and wavelet node The scales and shfts for each wavelet node are optmsed b ncludng em as parameters wn e usual backprop tranng algorm We can us determne e optmal set of scales and shfts approprate for e predcton task we wsh to solve Detals of e dervaton of backprop for e wavelet neurons are gven n Appendx A In addton to s tranng rule to optmse e shfts and scales of e wavelet nodes, we have also devsed mechansms to control bo e orogonalt of e ndvdual wavelets and e regularsaton of e complext of e network mappng as a whole Detals of s are agan ncluded n Appendx A DWT Predcton Horzon FTSE-00 Tme 40 lagged returns Pseudo weghts Wavelet nodes MLP 5 mn 30 mn 60 mn 90 mn Fgure 3: The WEAPO archtecture 4
5 5 PREDICTIG FTSE-00 FUTURES USIG WEAPO ETWORKS 5 The Data We shall appl e network archtecture and tranng rules descrbed n e prevous secton to e task of predctng future returns of e FTSE-00 ndex futures quoted on LIFFE The hstorcal prce data used was tckb-tck quotes of actual trades suppled b LIFFE The data was pre-processed to a -mnutel format b takng e average volume adusted traded prce durng each mnute Mssng values were flled n b nterpolaton but were marked un-tradable Prces were obtaned n s manner for e whole of Januar 995- June 996 to eld approxmatel 00,000 dstnct prces The entre data set was en dvded nto ree dstnct subsets, tranng/valdaton, optmsaton and test We traned and valdated e neural network models on e frst sx mons of 995 data The predcton performance results, quoted n s secton, are e results of applng e neural networks to e second sx mons of e 995 data The STTS tradng model parameters (descrbed n e next secton) were also optmsed usng s perod We reserved e whole of 996 for out-of-sample tradng performance test purposes Fgure 4: The FTSE-00 future Januar 995 to Januar 996 5 Indvdual Predctor Performances Table to 4 show e performances of four dfferent neural network predctors for e four predcton horzons (5, 30, 60 and 90 mnute ahead predctons) The predctors used were A smple earl-stoppng MLP traned usng all 40 lagged return nputs w an optmsed number of hdden nodes found b exhaustve search (-3 nodes) A standard weght deca MLP traned usng all 40 lagged returns w e value of weght deca, lambda, optmsed b cross valdaton 3 An MLP traned w Laplacan weght deca and weght/node elmnaton (as n Wllams [4]) 4 A WEAPO archtecture usng wavelet nodes, soft orogonalsaton constrants and Laplacan weght deca for weght/node elmnaton The performances of e archtectures are shown n terms of RMSE predcton error n terms of desred and actual network outputs Turnng pont accurac: Ths s e number of tmes e network correctl predcts e sgn of e future return 3 Large turnng pont accurac: Ths s e number of tmes at e network correctl predcts e sgn of returns whose magntude s greater an one standard devaton from zero (s measure s relevant n terms of expected tradng sstem performance) 5
6 Predcton horzon % Accurac Large % Accurac RMSE 5 507% 5477% 0003 30 504% 5955% 0039379 60 569% 546% 007403 90 5% 508% 0085858 Table : Results for MLP usng earl stoppng Predcton horzon % Accurac Large % Accurac RMSE 5 5075% 570% 00533 30 500% 5608% 003459 60 5335% 5409% 006099 90 548% 574% 08560 Table : Results for weght deca MLP Predcton horzon % Accurac Large % Accurac RMSE 5 55% 4855% 000467 30 546% 5434% 00356 60 464% 4348% 0064493 90 5039% 508% 009000 Table 3: Results for Laplacan weght deca MLP Predcton horzon % Accurac Large % Accurac RMSE 5 537% 5743% 00879 30 579% 569% 00344 60 546% 5794% 006044 90 55% 588% 0088 Table 4: Results for WEAPO We conclude at e WEAPO archtecture and e smple weght deca archtecture appear sgnfcantl better an e oer two technques The WEAPO archtecture appears to be partcularl good at predctng e sgn of large market movements 53 Use of Commttees for Predcton In e prevous secton, we presented predcton performance results usng a sngle WEAPO archtecture appled to e four requred predcton horzons A number of auors have suggested e use of lnear combnatons of neural networks as a means of mprovng e robustness of neural networks for forecastng and oer tasks The basc dea of a commttee s to ndependentl tran a number of neural networks and to en combne er outputs Suppose we have traned neural networks and at e output of e net s gven b ( x) ρ The commttee response s gven b 0 ρ ρ ( x) α ( x) + α 0 (6) where α s e weghtng for e network and α0 s e bas of e commttee The weghtngs, α ma eer be smple averages (Basc Ensemble Meod) or ma be optmsed usng an OLS procedure (Generalsed Ensemble Meod) Specfcall, e OLS weghtngs ma be determned b ρ ρ α Ξ Γ (7) 6
7 where Ξ and Γ are defned n terms of e outputs of e ndvdual traned networks and e tranng examples, e T ρ ρ [ ξ ] ( xt ) ( x, t ) Ξ T ρ Γ T t [ γ ] ( xt ) T t ρ t t (8) where x ρ s e nput vector, t s e correspondng target response and T s e number of tranng examples Below, we show e predcton performances of commttees composed of fve ndependentl traned WEAPO archtectures, for each of e predcton horzons We conclude at e performances (n terms of RMSE) are superor to ose obtaned usng a sngle WEAPO archtecture Turnng pont detecton accurac however, s broadl smlar Predcton horzon % Accurac Large % Accurac RMSE 5 535% 577% 00734 30 534% 5698% 0036 60 5447% 577% 00559 90 559% 5869% 00809 Table 5: Results for Commttees of fve ndependentl traned WEAPO archtectures 6 THE SIGAL THRESHOLDED TRADIG SYSTEM 6 Background One mght nk at f we have a neural network or oer predcton model correctl predctng e future drecton of a market 60 percent of e tme, en t would be relatvel straghtforward to devse a proftable tradng strateg, In fact, s s not necessarl e case In partcular one must consder e followng: What are e effectve transacton costs at are ncurred each tme we execute a round-trp trade? Over what horzon are we makng e predctons? If e horzon s partcularl short term (e 5-mnute ahead predctons on ntra-da futures markets) s t reall possble to get n and out of e market quckl enough and more mportantl to get e quoted prces? In terms of buldng proftable tradng sstems t ma be more effectve to have a lower accurac but longer predcton horzons What level of rsk s beng assumed b takng e ndcated postons? We ma, for nstance, want to optmse not ust pure proft but perhaps some rsk-adusted measure of performance such as Sharpe Rato or Sterlng Rato An acceptable tradng sstem has to take account of some or all of e above consderatons 6 The Basc STTS Model Assume we have P predctors makng predctons about e expected FTSE-00 Futures returns Each of e predctors makes predctons for tme steps ahead Let e predcton of e predctor at tme t be denoted b p (t) We shall defne e normalsed tradng sgnal S(t) at tme t to be: 7
8 P p ( t) S( t) ω (9) where ω s e weghtng gven to e predctor An llustraton of s s gven n Fgure 5 5 mnutes 30 mnutes 60 mnutes 90 mnutes 5 30 P 60 P 90 ω ω ω P ω P S ( t) P ( t) ω Fgure 5: Weghted summaton of predctons from four WEAPO commttee predctors to gve a sngle tradng sgnal We shall base e tradng strateg on e streng of e combned tradng sgnal S(t) at an gven tme t At tme t we compare e tradng sgnal S (t) w two resholds, denoted b α and β These two resholds are used for e followng decsons: α s e reshold at controls when to open a long or short trade β s e reshold used to decde when to close out an open long or short trade At an gven tme t, e tradng sgnal wll be compared w e approprate reshold usng e current tradng poston In partcular, detals of e actons defned for each tradng poston are found n Table 6: Current poston Test Acton: Go Flat f S(t) > α Long Flat f S(t) < -α Short Long f S(t) < -β Flat Short f S(t) > β Flat Table 6: Usng e tradng resholds to decde whch acton to take Fgure 6 demonstrates e concept of usng e two resholds for tradng The two graphs shown Fgure 6 are e tradng sgnals S (t) for each tme t (top graph) and e assocated prces p (t) dsplaed n e bottom graph The prce graph s colour coded for e dfferent tradng poston at are recommended, blue for along recommendaton, red for a short tradng recommendaton and gre oerwse At e begnnng of tradng we are n a flat poston We shall open a trade f e tradng sgnal exceeds e absolute value of α At e tme marked ❶ s s e case snce e tradng sgnal s greater an α We shall open a long trade Unless e tradng sgnal falls below - β, s long trade wll sta open Ths condton s fulflled at e tme marked ❷, when we shall close out e long trade We are now agan n a flat poston At tme ❸ e tradng sgnal falls below -α, so we open a short tradng poston Ths poston s not closed out untl e tradng sgnal exceeds β, whch occurs at tme ❹ when e short trade s closed out 8
9 s(t) Go long Go short Go short α β β α Prce ❶ ❷ ❸ ❹ Go flat Go flat Go flat Fgure 6: Tradng sgnals and prces 7 RESULTS An STTS tradng sstem, as descrbed above, was formed usng as nput 4 WEAPO commttee predctors Each commttee contaned fve ndependentl traned WEAPO networks and was traned to produce 5, 30, 60 and 90-mnute ahead predctons, respectvel A screen-shot from e software used to perform s smulaton (Amber) s shown below n Fgure 7 The optmal values for e STTS resholds α and β and e four STTS predctor weghtngs, ω to use were found b assessng e performance of e STTS model on e optmsaton data (last 6 mons of 995) usng partcular values for e parameters The parameters were en optmsed usng smulated annealng w e obectve functon beng e tradng performance on s perod measured n terms of Sharpe Rato In terms of tradng condtons, t was assumed at ere would be a ree mnute dela n openng or closng an trade and at e combned bd-ask spread / transacton charge for each round trp trade would be 8 ponts Bo are consdered conservatve estmates After e optmal parameters for e STTS sstem were determned, e tradng sstem was appled to e prevousl unseen data of e frst half of 996 Table 7 summarses e tradng performance over e sx-mon test perod n terms of over-all proftablt, tradng frequenc and Sharpe Rato Monl net proftablt n tcks 53 Average monl tradng frequenc (roundtrp) 8 Sharpe rato dal (monl) 036 (048) Table 7: Results of tradng sstem on e unseen test perod 9
0 Fgure 7: The Tradng Sstem for FTSE-00 futures 40 lagged returns are extracted from e FTSE-00 future tme seres and after standardsaton, nput to e 0 WEAPO predctors, arranged n four commttes Each commttee s responsble for a partcular predcton horzon The predctons are en combned for each commttee and passed onto e STTS tradng module 8 COCLUSIO We have presented a complete tradng model for adoptng poston n e LIFFE FTSE-00 Future In partcular, we have developed a sstem at avods e ree weaknesses at we dentfed n e ntroducton, namel Data Pre-Processng We have constraned e effectve dmenson of e 40 lagged returns b mposng a Dscrete Wavelet Transform on e nput data va e WEAPO archtecture We have also, wn e WEAPO archtecture devsed a meod for automatcall dscoverng e optmal number of wavelets to use n e transform and also whch scales and dlatons should be used Regularsaton We have appled Baesan regularsaton technques to constran e complext of e predcton models We have demonstrated e requrement for s b comparng e predcton performances of regularsed and unregularsed (earl-stoppng) neural network models 3 STTS Tradng Model The STTS model s desgned to transform predctons nto actual tradng strateges Its obectve crteron s erefore not RMS predcton error but e rsk adusted proft of tradng strateg The model has been shown to provde relatvel consstent profts n smulated out-of-sample hgh frequenc tradng over a 6-mon perod 0
9 BIBLIOGRAPHY [] DE Rummelhart, GE Hnton, RJ Wllams Learnng Internal Representatons B Error Propagaton In Parallel Dstrbuted Processng Chapter 8 MIT Press 986 [] DJC MacKa Baesan Interpolaton eural Comput 4(3), 45-447, 99 [3] DJC MacKa A Practcal Baesan Framework For Backprop etworks eural Comput 4(3), 448-47,99 [4] BA Telfer, H Szu GJ Dobeck Tme-Frequenc, Multple Aspect Acoustc Classfcaton World Congress on eural etworks, Vol pp II-34 II-39, Jul 995 [5] DP Casasent JS Smokeln eural et Desgn of Macro Gabor Wavelet Flters for Dstorton-Invarant Obect Detecton In Clutter Optcal Engneerng, Vol 33, o7, pp 64-70 Jul 994 [6] H Szu B Telfer eural etwork Adaptve Flters For Sgnal Representaton Optcal Engneerng 3, 907-96, 99 [7] I Debauches Oronormal Bases of Compactl Supported Wavelets Communcatons n Pure and Appled Maematcs, 988 Vol 6, o 7, pp 909-996 [8] K Fukunaga, Statstcal Pattern Recognton ( nd Edton), Academc Press, 990 [9] MFMoller A Scaled Conugate Gradent Meod For Fast Supervsed Learnng [0] WLBuntne ASWegend Baesan Back-Propagaton Complex Sstems 5, 603-643 [] D L Toulson, S P Toulson, Use of eural etwork Ensembles for Portfolo selecton and Rsk Management, Proc Forecastng Fnancal Markets, Thrd Internatonal conference, London, 996 [] D L Toulson, S P Toulson, Use of eural etwork Mxture Models for Forecastng and Applcaton to Portfolo Management, Sx Internatonal Smposum on forecastng, Istanbul, 996 [3] SE Fahlman Faster Learnng Varatons On Back-Propagaton: An Emprcal Stud Proceedngs Of The 988 Connectonst Models Summer School, pp 38-5 Morgan Kaufmann [4] PM Wllams Baesan Regularsaton and Prunng Usng A Laplace Pror eural Computaton, Vol 5, 993 [5] Y Meer Wavelets and Operators Cambrdge Unverst Press, 995
APPEDIX A DERIVIG BACKPROP FOR WEAPO The MLP s usuall traned usng error backpropagaton Backprop requres e calculaton of e partal dervatves of e data error E D w respect to each of e free parameters of e network (usuall e weghts and bases of e nodes) For e case of wavelet neurons, e weghts between e neuron and e nput pattern are not free but are constraned to assume dscrete values of a partcular chld wavelet The free parameters for e wavelet nodes are erefore not e weghts, but e values of translaton and dlaton and To optmse ese parameters durng tranng, we must obtan expressons for e partal dervatves of e error functon w respect to ese two wavelet parameters The usual form of e backpropagaton algorm s: E E,, ω ω (0) The term E, often referred to as δ, s e standard backpropagaton of error term, whch ma be found n e usual wa for e case of e wavelet nodes The partal dervatve ω, must be substtuted w e partal dervatves of e node output w respect to e wavelet parameters For a gven moer wavelet ) (x, consder e output of e wavelet node, gven n Equaton (4) Takng partal dervatves w respect to e translaton and dlaton elds: x x x x x ' ) ( ' 5 3 () Orogonalsaton of e Wavelet odes A potental problem w usng wavelet nodes s at duplcaton n e parameters of some of e wavelet nodes ma occur One wa of avodng s tpe of duplcaton would be to appl a soft constrant of orogonalt on e wavelets of e hdden laer Ths could be done rough use of e addton of e error functon W E,, () where denotes e proecton g f g f ) ( ) (, (3)
3 In e prevous secton, backprop was derved n terms of e unregularsed sum of squares data error term, We now add n an addtonal term for e orogonalt constrant to eld a combned error functon M(W), gven b E D W M ( W ) αe D + γe (4) Weght and ode Elmnaton A number of technques have been suggested n e lterature for node and/or weght elmnaton n neural networks We shall adopt e technque proposed b Wllams [4,, 3] and use a Laplacan pror as a natural meod of elmnatng redundant nodes The Laplacan Pror on e weghts mples an addtonal term n e error functon, e D + W M ( W ) αe + γe βe (5) W where E W s defned as E W ω, (6), A consequence of s pror s at durng tranng, weghts are forced to adopt one of two postons A weght can eer adopt equal data error senstvt as all e oer weghts or s forced to zero Ths leads to skeletonsaton of a network Durng s process, weghts, hdden nodes or nput components ma be removed from e archtecture As e weghts emergng from redundant wavelet nodes wll have neglgble data error senstvt, s wll cause em to be elmnated 3