Genetic Algorithm Search for Predictive Patterns in Multidimensional Time Series



Similar documents
TEMPORAL PATTERN IDENTIFICATION OF TIME SERIES DATA USING PATTERN WAVELETS AND GENETIC ALGORITHMS

Chapter 8: Regression with Lagged Explanatory Variables

Journal Of Business & Economics Research September 2005 Volume 3, Number 9

SPEC model selection algorithm for ARCH models: an options pricing evaluation framework

Measuring macroeconomic volatility Applications to export revenue data,

Multiprocessor Systems-on-Chips

INTEREST RATE FUTURES AND THEIR OPTIONS: SOME PRICING APPROACHES

Predicting Stock Market Index Trading Signals Using Neural Networks

Can Individual Investors Use Technical Trading Rules to Beat the Asian Markets?

DOES TRADING VOLUME INFLUENCE GARCH EFFECTS? SOME EVIDENCE FROM THE GREEK MARKET WITH SPECIAL REFERENCE TO BANKING SECTOR

Market Liquidity and the Impacts of the Computerized Trading System: Evidence from the Stock Exchange of Thailand

Stock Price Prediction Using the ARIMA Model

Principal components of stock market dynamics. Methodology and applications in brief (to be updated ) Andrei Bouzaev, bouzaev@ya.

Distributing Human Resources among Software Development Projects 1

Duration and Convexity ( ) 20 = Bond B has a maturity of 5 years and also has a required rate of return of 10%. Its price is $613.

MACROECONOMIC FORECASTS AT THE MOF A LOOK INTO THE REAR VIEW MIRROR

ANALYSIS AND COMPARISONS OF SOME SOLUTION CONCEPTS FOR STOCHASTIC PROGRAMMING PROBLEMS

Vector Autoregressions (VARs): Operational Perspectives

Making a Faster Cryptanalytic Time-Memory Trade-Off

Stock Trading with Recurrent Reinforcement Learning (RRL) CS229 Application Project Gabriel Molina, SUID

BALANCE OF PAYMENTS. First quarter Balance of payments

The Application of Multi Shifts and Break Windows in Employees Scheduling

Risk Modelling of Collateralised Lending

Time Series Analysis Using SAS R Part I The Augmented Dickey-Fuller (ADF) Test

4. International Parity Conditions

Option Put-Call Parity Relations When the Underlying Security Pays Dividends

Hedging with Forwards and Futures

Morningstar Investor Return

The naive method discussed in Lecture 1 uses the most recent observations to forecast future values. That is, Y ˆ t + 1

USE OF EDUCATION TECHNOLOGY IN ENGLISH CLASSES

Term Structure of Prices of Asian Options

Individual Health Insurance April 30, 2008 Pages

Gene Regulatory Network Discovery from Time-Series Gene Expression Data A Computational Intelligence Approach

DYNAMIC ECONOMETRIC MODELS Vol. 7 Nicolaus Copernicus University Toruń Ryszard Doman Adam Mickiewicz University in Poznań

Single-machine Scheduling with Periodic Maintenance and both Preemptive and. Non-preemptive jobs in Remanufacturing System 1

Improving Technical Trading Systems By Using A New MATLAB based Genetic Algorithm Procedure

INTRODUCTION TO FORECASTING

PROFIT TEST MODELLING IN LIFE ASSURANCE USING SPREADSHEETS PART ONE

Supplementary Appendix for Depression Babies: Do Macroeconomic Experiences Affect Risk-Taking?

LIFE INSURANCE WITH STOCHASTIC INTEREST RATE. L. Noviyanti a, M. Syamsuddin b

Day Trading Index Research - He Ingeria and Sock Marke

A New Type of Combination Forecasting Method Based on PLS

Performance Center Overview. Performance Center Overview 1

Appendix D Flexibility Factor/Margin of Choice Desktop Research

Chapter 7. Response of First-Order RL and RC Circuits

Hotel Room Demand Forecasting via Observed Reservation Information

Idealistic characteristics of Islamic Azad University masters - Islamshahr Branch from Students Perspective

Trends in TCP/IP Retransmissions and Resets

The Kinetics of the Stock Markets

MTH6121 Introduction to Mathematical Finance Lesson 5

Nikkei Stock Average Volatility Index Real-time Version Index Guidebook

SELF-EVALUATION FOR VIDEO TRACKING SYSTEMS

Small and Large Trades Around Earnings Announcements: Does Trading Behavior Explain Post-Earnings-Announcement Drift?

WATER MIST FIRE PROTECTION RELIABILITY ANALYSIS

A Note on Using the Svensson procedure to estimate the risk free rate in corporate valuation

Usefulness of the Forward Curve in Forecasting Oil Prices

GOOD NEWS, BAD NEWS AND GARCH EFFECTS IN STOCK RETURN DATA

ARCH Proceedings

Statistical Analysis with Little s Law. Supplementary Material: More on the Call Center Data. by Song-Hee Kim and Ward Whitt

Real-time Particle Filters

The Grantor Retained Annuity Trust (GRAT)

TSG-RAN Working Group 1 (Radio Layer 1) meeting #3 Nynashamn, Sweden 22 nd 26 th March 1999

SURVEYING THE RELATIONSHIP BETWEEN STOCK MARKET MAKER AND LIQUIDITY IN TEHRAN STOCK EXCHANGE COMPANIES

Relationships between Stock Prices and Accounting Information: A Review of the Residual Income and Ohlson Models. Scott Pirie* and Malcolm Smith**

Forecasting and Forecast Combination in Airline Revenue Management Applications

Feasibility of Quantum Genetic Algorithm in Optimizing Construction Scheduling

Why Did the Demand for Cash Decrease Recently in Korea?

Stochastic Optimal Control Problem for Life Insurance

GoRA. For more information on genetics and on Rheumatoid Arthritis: Genetics of Rheumatoid Arthritis. Published work referred to in the results:

Chapter 1.6 Financial Management

The Transport Equation

The Economic Value of Volatility Timing Using a Range-based Volatility Model

Market Timing & Trading Strategies using Asset Rotation

TIME SERIES DATA MINING: IDENTIFYING TEMPORAL PATTERNS FOR CHARACTERIZATION AND PREDICTION OF TIME SERIES EVENTS

An Empirical Comparison of Asset Pricing Models for the Tokyo Stock Exchange

Contrarian insider trading and earnings management around seasoned equity offerings; SEOs

The Impact of Surplus Distribution on the Risk Exposure of With Profit Life Insurance Policies Including Interest Rate Guarantees.

Skewness and Kurtosis Adjusted Black-Scholes Model: A Note on Hedging Performance

How To Calculate Price Elasiciy Per Capia Per Capi

Optimal Stock Selling/Buying Strategy with reference to the Ultimate Average

DETERMINISTIC INVENTORY MODEL FOR ITEMS WITH TIME VARYING DEMAND, WEIBULL DISTRIBUTION DETERIORATION AND SHORTAGES KUN-SHAN WU

II.1. Debt reduction and fiscal multipliers. dbt da dpbal da dg. bal

Impact of scripless trading on business practices of Sub-brokers.

Cointegration: The Engle and Granger approach

Automatic measurement and detection of GSM interferences

The Impact of Surplus Distribution on the Risk Exposure of With Profit Life Insurance Policies Including Interest Rate Guarantees

Random Walk in 1-D. 3 possible paths x vs n. -5 For our random walk, we assume the probabilities p,q do not depend on time (n) - stationary

Segment and combine approach for non-parametric time-series classification

Transcription:

Geneic Algorihm Search for Predicive Paerns in Mulidimensional Time Series Arnold Polanski School of Managemen and Economics Queen s Universiy of Belfas 25 Universiy Square Belfas BT7 1NN, Unied Kingdom a.polanski@qub.ac.uk Based on an algorihm for paern maching in characer srings, a paern maching machine is implemened ha searches for occurrences of paerns in mulidimensional ime series. Before he search process akes place, ime series daa is encoded in user-designed alphabes. The paerns, on he oher hand, are formulaed as regular expressions ha are composed of leers from hese alphabes and operaors. Furhermore, a geneic algorihm is developed o breed paerns ha maximize a userdefined finess funcion. In an applicaion o financial daa, i is shown ha paerns bred o predic high exchange raes volailiy in raining samples reain saisically significan predicive power in validaion samples. 1. Inroducion This work is a conribuion o he rapidly developing research area of daa mining, a hos of mehods ha aim a revealing hidden relaionships and regulariies in large ses of daa. Of paricular imporance is he class of daa mining problems concerned wih discovering frequenly occurring paerns in sequenial daa. We propose a versaile nonparameric echnique for represening mulidimensional daa by encoding i in alphabes ha are defined by an analys user. The encoded daa is explored by means of paerns, which are composed of operaors and leers from hese alphabes. Since paerns are regular expressions, hey can be auomaically manipulaed, combined, and evaluaed. These operaions lie a he hear of our geneic algorihm (GA), which evolves paerns in order o breed ever beer descripors and predicors of he daa. A concise and flexible paern descripion language is, herefore, a powerful ool for daa mining ha serves wo purposes: on he one hand, as a language in which heories concerned wih he underlying daa generaing process are formulaed and esed and, on he oher, as a forecasing insrumen. The presen approach shows is special srengh when dealing wih mulidimensional daa ha can be analyzed under muliple crieria and/or characerized by several indicaors. Usually, each crierion (indicaor) forms he base of an alphabe. Preprocessing he daa by encoding i in alphabes ensures ha he search for paerns unfolds efficienly. This is manifesly Complex a Sysems, precondiion 19 2011 for Complex a viable Sysems GA Publicaions, applicaion when he algorihm evaluaes paerns based on heir maches. Inc. Furhermore, he possibiliy o design daa-specific alphabes makes

The presen approach shows is special srengh when dealing wih mulidimensional 196 daa ha can be analyzed under muliple A. crieria Polanski and/or characerized by several indicaors. Usually, each crierion (indicaor) forms he base of an alphabe. Preprocessing he daa by encoding i in alphabes ensures ha he search for paerns unfolds efficienly. This is manifesly a precondiion for a viable GA applicaion when he algorihm evaluaes paerns based on heir maches. Furhermore, he possibiliy o design daa-specific alphabes makes he mehod applicable no only o highly diverse record ses bu also allows each researcher o analyze he (same) daa wih an idiosyncraic language. We sress here an imporan deparure from he more radiional echniques of forecasing complex sysems. Many mehods, like kernel regression, neural neworks, or reinforcemen learning (see [1] for recen developmens), esimae he fuure oupu of a sysem as a funcion of a fixed number of pas observaions. In conras, he presen approach does no resric he relevan pas o ime windows of fixed lenghs. I is only imporan ha he pas sae of he sysem and he sysem s response o ha sae frequenly generae measurable oucomes wih some, ypically ex ane unknown, characerisics. These characerisics are encapsulaed as paerns in a suiable language and searched for in he encoded ime series. The analysis and forecasing of mulidimensional daa is a he cener of research in, for example, finance, elecrical engineering, heoreical physics, and he compuer sciences. Various mahemaical mehods have been proposed for he descripion and analysis of inerdependencies in mulivariae ime series. An ineresed reader can find a recen overview of hese mehods (including Granger causaliy, direced ransfer funcions, and parial direced coherence) in [2]. In his work we are ineresed in an ex ane unknown ype of mulidimensional relaionship wih possibly changing ime frames; as such, a less srucured mehodology is required. As a well esablished and versaile search heurisic, GAs seem o be a promising approach for generaing paern descripors wih predicive power. Alhough mahemaical foundaions and properies of GAs are far from being seled, here is some evidence ha GAs migh become a generic ool for universal compuaion. For example, he work by Sapin e al. [3, 4] suggess ha GAs have he poenial o idenify cellular auomaa ha suppor universal compuaion. This paper is organized as follows: in Secion 2, we describe he encoding process for mulidimensional ime series and define paerns. The GA for paern evoluion is presened in Secion 3. In Secion 4, some relaed approaches are discussed. Secion 5 conains an applicaion o financial ime series daa and Secion 6 concludes. 2. Time Series, Texs, and Paerns Based on an algorihm for paern maching in characer srings [5], we implemen a deerminisic paern maching machine ha searches for occurrences of paerns in mulidimensional ime series x Ix 1,, x N M, where x i Ix i 1,, x i T M is a vecor of T observaions. Before he search process akes place, he ime series daa is encoded as srings of leers from user-defined alphabes. Alphabes are ses composed of mahemaical expressions (condiions) ha yield he Boolean value rue or false when evaluaed wih respec o x. For ex-

Based on an algorihm for paern maching in characer srings [5], we Geneic implemen Algorihm Search a deerminisic for Predicive Paerns paern maching machine ha 197 searches for occurrences of paerns in mulidimensional ime series x Ix 1,, x N M, where x i Ix i 1,, x i T M is a vecor of T observaions. Before he search process akes place, he ime series daa is encoded as srings of leers from user-defined alphabes. Alphabes are ses composed of mahemaical expressions (condiions) ha yield he Boolean value rue or false when evaluaed wih respec o x. For example, he condiion x i > xi -1 reurns rue (false) a all daes a which he i h ime series increases (weakly decreases). A sequence of condiions is called an alphabe if exacly one of hem is rue in each period. Hence, 9x 3 > x 3-1, x 3 x 3-1 = is an example of an alphabe wih wo muually exclusive condiionsleers. Given a mulidimensional ime series x and a se of alphabes 9A 1,, A K =, he following algorihm generaes a T äk dimensional ex a Ia k M : for each dae 1,, T for each alphabe k 1,, K a k he ordinal of a condiion in A k ha evaluaes rue wih regard o x a. Each column a k in he ex a represens he informaion in x ha is encoded hrough A k. A generic elemen a k is an ineger beween one and he number of condiions-leers in he alphabe A k. We consider, herefore, he marix a as a mulidimensional ex recorded in naural numbers. Noe ha he number of ime series N in x and he number of alphabes K will usually differ. The diagram in Figure 1 illusraes he encoding of a fragmen of five observaions from a mulidimensional ime series x Ix 1,, x 4 M according o wo alphabes 9A 1, A 2 = wih he resuling bidimensional ex a Ia 1, a 2 M for 1,, 5. The choice of alphabes is enrused o he experise of he end user of he sysem. Generally, he leers in he alphabes will es condiions on cerain indicaors ha are deemed relevan for he subjec under sudy. The laer indicaors may be derived from economic variables in early warning sysems for he predicion of financial crises [6, 7], from specific proein informaion in cancer deecion sysems [8, 9], or from echnical rading rules [10 13]. The almos unresriced freedom in he specificaion of alphabes is a he same ime a srengh and a weakness of he presen approach. On he one hand, is inheren flexibiliy allows for immediae and fine-uned applicaion o many research areas bu, on he oher, i burdens he researcher wih a edious and ulimaely open quesion of finding opimal alphabes. In Secion 5, we illusrae he ypes of alphabes ha can be used for encoding financial daa.

198 A. Polanski Figure 1. The encoding of a ime series ha includes daes. The ime series x 1 conains he daes 5.12.2005 hrough 9.12.2005 ha correspond o weekdays 1 hrough 5 (Monday hrough Friday). Given a se of alphabes, a succinc descripion of a relevan aspec of he underlying daa is expressed as a paern. The laer is defined as a regular expression ha is composed of leers, operaors, and parenheses. A leer is represened by a pair [condiion-leer number : alphabe] enclosed in square brackes. For example, @2 : 1D sands for he second condiion-leer from he firs alphabe. We consider he hree fundamenal operaors concaenaion, or, and and. Concaenaion, as used, for insance, in he paern @2 : 1D @3 : 2D where he second leer from he firs alphabe is followed by he hird leer from he second alphabe. Since he ex a consiss of naural numbers and a k i is inerpreed as he i h leer of he k h alphabe a posiion, his paern maches fragmens of a, saring a, such ha a 1 2 and a2 3. Or (+) beween wo subpaerns P 1 and P 2 implies ha here is a mach if and only if eiher P 1 or P 2 (or boh) occurs. For example, he paern @3 : 1D @5 : 1D + @4 : 2D @4 : 2D describes fragmens of he ex a, where a 1 3 is followed by a1 5 or where a 2 a2 4.

Geneic Algorihm Search for Predicive Paerns 199 And (*) beween wo subpaerns P 1 and P 2 implies ha here is a mach if and only if boh P 1 and P 2 occur simulaneously. The paern @3 : 1D * @4 : 2D @6 : 2D deecs, herefore, fragmens of a such ha a 1 3 and a 2 4, a2 6. Finally, parenheses induce he desired order of operaors in he usual way: H@3 : 1D + @4 : 1DL * @5 : 2D ª Ia 1 3 or a 1 4M and a 2 5, @3 : 1D + H@4 : 1D * @5 : 2DL ª a 1 3 or Ia 1 4 and a 2 5M. A paern ha complies wih he synacic rules can be searched for in he encoded ex. The following algorihm searches for maches of he paern p in he ex a beween daes T 1 and T 2 (i.e., in all rows of a beween and including a T1 and a T2 ): se T 1 ; while T 2 repea 8 if a mach of lengh k sars a dae hen record H, + k - 1L in he se M p HT 1, T 2 L; se + 1; <. The oucome of he algorihm is exemplified in Figure 2, where he fragmens of he ex a ha are mached by he paern specificaion p @1 : 1D * H@2 : 2D + @4 : 2D @5 : 3DL are enclosed in recangles (he firs recangle maches @1 : 1D * @2 : 2D while he second maches @1 : 1D * H@4 : 2D @5 : 3DL). Figure 2. Fragmens of he ex a ha are mached by he paern p. The maching algorihm iself is based on he implemenaion of he deerminisic finie sae auomaon in [5] wih imporan modificaions o accoun for mulidimensional exs and operaors. Afer running he maching program, he se M p HT 1, T 2 L conains pairs H s, e L wih he sar and end daes of maches. If wo or more maches sar on he same dae, he mach wih he minimum lengh is recorded. If wo or more maches end a he same ime, only one of hem is kep in M p HT 1, T 2 L.

The maching algorihm iself is based on he implemenaion of 200he deerminisic finie sae auomaon in [5] wih imporan A. Polanski modificaions o accoun for mulidimensional exs and operaors. Afer running he maching program, he se M p HT 1, T 2 L conains pairs H s, e L wih he sar and end daes of maches. If wo or more maches sar on he same dae, he mach wih he minimum lengh is recorded. If wo or more maches end a he same ime, only one of hem is kep in M p HT 1, T 2 L. The elemens of M p HT 1, T 2 L will be ypically used as signals of somehing, say, a financial crisis, a share price increase, or a issue developing cancer. In order o evaluae he predicive power of paerns, he user mus define a finess funcion ha maps he se of maches ino real numbers. For example, if he vecor x 1 conains sock prices a consecuive decision imes (i.e., days or hours), hen he funcion x1 Hs, e LœM p HT 1,T 2 L ln e +1 x1 e compues he accumulaed profi ha is made when he sock is bough, whenever a mach ends in he decision period. This funcion aains high values for paerns ha consisenly signal rising sock prices afer mach occurrences. I can be inerpreed, herefore, as a measure of finess for paerns ha ac as buy signals. An ineresing poin o noe is ha he paerns are searched for in he encoded ex a, while he evaluaion of he paern finess involves he original ime series x. Besides he use as a forecasing insrumen, he presen approach may be applied as a language, in which quaniaive heories are formulaed. Suppose, for insance, ha a researcher conjecures ha a variable x 1 under sudy exceeds a desired level x 1 if eiher he variable x 2 assumes values below x 2 or he variable x 3 assumes values above x 3. Then, afer encoding x Ix 1, x 2, x 3 M in hree alphabes, A k 9x k x k, x k > x k =, k 1,.., 3, for some hresholds x k, he laer heory can be phrased in erms of he paern @2 : 1D * H@1 : 2D + @2 : 3DL and mached wih he daa. In Secion 3 we develop a GA for paern evoluion. The aim of he GA is o creae a populaion of paerns ha are opimized wih respec o a finess funcion over a raining se. Obviously, paerns bred in raining samples will be reliable predicors only if hey reain heir predicive power in evaluaion samples.

Geneic Algorihm Search for Predicive Paerns 201 3. Geneic Algorihm A GA is a search echnique for finding approximae soluions o opimizaion and search problems. GAs are ypically implemened as a compuer simulaion in which a populaion of absrac represenaions (chromosomes) of candidae soluions o an opimizaion problem evolves oward beer soluions. The presen GA evolves paerns, ha is, regular expressions ha use he building blocks of leers, parenheses, and operaors. By combining and modifying he bes performing paren paerns, new generaions of offspring wih increasing average finess are creaed. The presen GA uses he hree basic operaions of crossover, muaion, and selecion. Crossover (xover) exracs fragmens from wo paren paerns and combines hem by means of fundamenal operaors ino a valid offspring paern as illusraed in he following example: H@1 : 1D + @1 : 2DL@2 : 3D xover @2 : 1D @2 : 2D * @3 : 1D Ø H@1 : 1D + @1 : 2DL + @3 : 1D. In his example, a combinaion of H@1 : 1D + @1 : 2DL and @3 : 1D is inheried by he offspring, while @2 : 3D and @2 : 1D @2 : 2D vanish. Muaion changes a par of he paern o a subpaern ha maches a fragmen randomly drawn from he encoded ex a. In he nex example, he expression in parenheses undergoes a muaion o he subpaern @1 : 2D * @3 : 3D ha maches a 2 1, a 3 3, a randomly rerieved fragmen of a: @1 : 1D @2 : 1D H@1 : 1D + @3 : 2DL Ø mu @1 : 1D @2 : 1D H@1 : 2D * @3 : 3DL. Selecion picks ou he bes-performing paerns (wih respec o he user-defined finess funcion). Noe ha he resul of he breeding process is a regular expression ha complies wih he synacic rules for paerns. The srucure of he main loop of he GA is depiced in Figure 3.

202 A. Polanski Figure 3. The srucure of he main loop of he paern breeding GA. 4. Relaed Lieraure Faced wih abundan lieraure on daa mining, we focus on hree closely relaed papers in order o emphasize he main deparures of he presen work from he exising approaches. Szpiro [13] implemens a GA ha permis he discovery of equaions of he daa-generaing process in symbolic form. His GA uses pars of equaions, consans, and he basic arihmeic operaors o breed ever beer formulas. Apar from furnishing a deeper undersanding of he dynamics of a process, his mehod also permis global predicions and forecass. Unlike his search for a hidden relaionship, our GA does no work on raw daa bu on encoded informaion. This approach allows for including predicors (e.g., adapive moving averages) ha are very unlikely or impossible o be developed by Szpiro s algorihm. Furhermore, his algorihm is resriced o uncovering funcional relaionships whereas ours deecs relevan paerns in daa. Dempser [14] applies a GA o evolve rading rules ha are based on echnical indicaors. Poenial rules are consruced as binary rees in which he erminal nodes are indicaors (e.g., adapive moving averages, relaive srengh index, sochasics, or momenum oscillaors) yielding a Boolean signal a each ime sep, and he nonerminal nodes are he Boolean operaors AND, OR, and XOR. The resul of his procedure is a se of fies rading rules ha recommend a ransacion (buy or sell) in each period. Unlike rading rules, paerns are Complex no Sysems, consrained 19 2011 o Complex emi Sysems a buy/sell Publicaions, signal Inc. a each ime sep. They are more flexible in he sense ha hey can focus exclusively on informaive sequences of observaions. Furhermore, he algorihm in [14]

on echnical indicaors. Poenial rules are consruced as binary rees in which he erminal nodes are indicaors (e.g., adapive moving averages, relaive srengh index, sochasics, or momenum oscillaors) yielding Geneic Algorihm a Boolean Search signal for Predicive a each Paerns ime sep, and he nonerminal 203 nodes are he Boolean operaors AND, OR, and XOR. The resul of his procedure is a se of fies rading rules ha recommend a ransacion (buy or sell) in each period. Unlike rading rules, paerns are no consrained o emi a buy/sell signal a each ime sep. They are more flexible in he sense ha hey can focus exclusively on informaive sequences of observaions. Furhermore, he algorihm in [14] (and oher commonly used algorihms for informaion exracion) work wih daa windows of fixed lengh. The GA described in Secion 3 breeds paerns wihou knowing he number of observaions ha hey mach a he ime of he design. Hence, i is able o creae paerns ha are able o deec regulariies which emerge afer specific hisories. In his manner, qualiaively idenical phenomena ha unfold on differen ime scales (fracal paerns) or srech over ime windows of variable lengh can be capured. Finally, Packard [15] develops a GA ha evolves a populaion of condiions, defined on an unidimensional independen variable x, as in he following example: C H20.1 x L Ô H30 x 40.5L Ô Hx +2 30L. Packard s algorihm works on condiions, adjusing consans and operaors in order o obain good predicors for a dependen variable. This approach is similar in spiri o evolving expressions, composed of condiions-leers, as described in Secion 2. Neverheless, an aemp o include elaboraed indicaors ino Packard s condiions leads o inolerable runimes as hey mus be evaluaed during he maching phase for each dae. Furhermore, an obvious exension of Packard s GA o mulidimensional ime series suffers severely from he curse of dimensionaliy. 5. An Applicaion 5.1 Daa and Alphabes As an applicaion, we esed he predicive power of paerns on financial ime series daa. We used he daily exchange raes for several currency pairs. The daa was downloaded from hp://www.forexrae.co.uk/forexhisoricaldaa.php. For each pair and day, he vecor x Ix i M i 1,, 5 conained x 1 dae, x 2 open, x 3 close, x 4 min, x 5 max, ha is, he curren dae and he opening, closing, maximum, and minimum exchange raes during his day. We used 1201 weekday observaions from Augus 25, 2003 hrough April 12, 2008 (hence, x Ix i M had he dimensions 1201ä5). We encoded x according o six alphabes obaining a ex a ha is six-dimensional and 1200 characers long:

204 A. Polanski A 1 9x 1 Monday,, x 1 Friday=, A 2 x 3 x3-1 < 0.998, 0.998 x 3 x3-1 < 0.9985,.., A 3 x 3 < 1.0005, 1.0005 x 3 < 1.001,.., x 3 x4 x4 A 4 x 5 < 1.0005, 1.0005 x 5 < 1.001,.., x 5 x3 x3 A 5 x 3 < 0.996, 0.996 < x 3 0.997,.., x 3 x2 x2 A 6 x 5 < 1.0005, 1.0005 x 5 < 1.001,.., x 5 x4 x4 x 3 x3-1 1.002, x 4 1.005, x 3 1.005, x 2 1.004, x 4 1.005. All alphabes excep A 1 were composed of nine condiions-leers and all of hem used only pas and presen informaion in x. In paricular, each row a of he ex was generaed by accessing informaion in x -1 and x only. The requiremen of using only available informaion is, obviously, essenial when we es he predicive power of paerns. 5.2 Paern Evaluaion and he Finess Funcion In order o creae effecive paerns by means of a GA in-sample, and o assess heir predicive power ou-of-sample, a suiable definiion of he finess funcion is crucial. The finess funcion ha we employed was designed o measure he difference beween sample means for wo muually exclusive and collecively exhausive ses: he se of enddaes of maches and is complemen. Specifically, for each paern p ha was mached in he ime window @T 1, T 2 D, we pariioned his window ino wo groups: he subse M of end-daes of p-maches and is complemen NM. In each subse, we compued he sample mean and he sample variance for he nex day log-reurns, x m œm ln x n m œnm ln x3 x2, n m x3 x2 n n m,

Geneic Algorihm Search for Predicive Paerns 205 s m 2 œm ln x 3 x2 s2 n m = œnm ln n m - 1 x3 x2 - x m 2 n n m - 1, - x nm 2 where n m M and n n m NM. For hese values, we calculaed he difference-of-means saisic,, reurn x m - x n m, s 2 m ë n m + s2 n m ë n n m and used i as boh he finess funcion for paern breeding in he raining se and as an esimae of predicive power in he validaion se. The finess funcion favored paerns indicaing relaively high expeced reurns for he nex day. Should he evolved paerns reain high finess ou-of-sample, our approach would be a (saisically) effecive forecasing insrumen. We applied he same procedure o define he performance measure range o evaluae paerns wih respec o he nex-day log difference in inraday exreme (min and max) values lnix5 ë x 4 M. Parkinson [16] proposed he difference in exreme values as a proxy for volailiy. We herefore considered paerns evolved wih range as indicaors of high volailiy. 5.3 Geneic Algorihm Afer encoding he daa and defining he finess funcion, we run a number of GA experimens. The main loop of each GA experimen (see Figure 3) evolved a populaion of N 100 paerns, ou of which he elie of K 15 fies survived each round and were seleced o reproduce. Each breeding loop was repeaed 50 000 imes using a raining window of 800 observaions o compue he finess. Subsequenly, he single bes performing paern of he breeding sage was esed in an ou-of-sample (validaion) window of 400 observaions. We experimened wih differen parameer values for he GA operaors wihou, however, deecing a significan impac on he resuls. Furhermore, in order o avoid overfiing in he raining se, we allowed only paerns wih a leas 10 maches per 100 observaions o survive.

206 A. Polanski 5.4 Resuls Table 1 summarizes he resuls of he GA experimen, which were compued as averages over 10 runs. Broadly speaking, Table 1 confirms he well-known sylized facs ha he reurns are no predicable bu he volailiy is (see [17] for a survey). Specifically, given he high numbers of maches in validaion ses (Table 1), we could rely on he cenral limi heorem and assume ha he saisic reurn is sandard normal under he Null of equal means in he subses M and NM of he validaion se. As he hird column in Table 1 shows, he Null could no be rejeced a any reasonable significance level for any currency pair. In oher words, he bes performing paern in he raining se (wih he finess repored in he second column) failed o deec maches in he validaion se ha were followed by significanly higher log reurns for he nex day. On he oher hand, he Null was rejeced a leas a he 1% and, usually, much lower, significance level when esed for he log difference in exreme values wih he range saisic. Only for he pair GBP / USD in he validaion se did we obain he P-value H2.34L º 1%. This is probably due o he relaively small number of maches (45) in his se. The nex larges P-value in he validaion se is of order 10-6 ( range 4.36 for GBP / CHF). The winning paern in he raining se deeced effecively nexday high volailiy also ou-of-sample (he las column in Table 1). Our approach was, herefore, successful a creaing (saisically) reliable predicors of volailiy. Currencies reurn raining reurn validaion range raining range validaion GBP ê EUR 5.23 H220L - 1.09 H107L 7.14 H202L 5.04 H97L GBP ê USD 5.34 H212L 0.31 H132L 7.20 H172L 2.34 H45L GBP ê CHF 5.90 H163L - 0.45 H71L 7.02 H190L 4.36 H92L USD ê EUR 5.66 H188L 0.77 H163L 7.35 H200L 4.72 H89L Table 1. In-sample (raining) and ou-of-sample (validaion) -saisics for one-day-ahead predicion of reurns ln Ix3 ë x 2 M ( reurn ) and volailiy ln Ix5 ë x 4 M ( range ). In parenheses, he number of maches. To compare our procedure wih a sandard echnique of volailiy forecasing, we esed he forecass generaed by he exponenially weighed moving average (EWMA). EWMA is widely used in pracice due o is simpliciy and is repored superioriy over more sophisicaed models [18]. The EWMA specifies he nex period s volailiy v as a weighed average of he curren modeled volailiy v and he curren observed volailiy, here measured by he price range ln Ix 5 ë x 4 M:

weighed moving average (EWMA). EWMA is widely used in pracice due o is simpliciy and is repored superioriy over more sophisicaed models [18]. The EWMA specifies he nex period s volailiy vgeneic as Algorihm a weighed Search average for Predicive of he Paerns curren modeled volailiy v and 207 he curren observed volailiy, here measured by he price range ln Ix 5 ë x 4 M: v a v + H1 - al ln x 5 x 4. For he same daa as in he GA experimen, we esimaed he EWMA parameer a in he raining window of 800 observaions by he maximum likelihood mehod and specified he hreshold v of high volailiy. This hreshold was se equal o he hird quarile of he empirical disribuion of observed volailiies in order o creae a similar number of high volailiy days as in he GA experimen, ha is, roughly 1/4 of all observaions in he sample. Using he given specificaions, we pariioned he validaion se of 400 observaions ino wo groups: he subse M of high volailiy forecass, v > v, and is complemen NM. For each subse, we compued he sample mean and he sample variance of observed volailiies lnix5 ë x 4 M and calculaed he difference-of-means saisic. The resuls, repored in Table 2, indicae ha EWMA forecass of high volailiy are saisically significan, alhough (wih he excepion of he GBP / USD pair) he -saisics lie below he values from he GA experimen (Table 1). In his simple example, he parsimony of he EWMA approach may ouweigh is slighly worse performance as compared o he elaborae GA procedure. The laer procedure, however, is designed o deec complicaed mulidimensional relaionships where is full srengh can come o he fore. Currencies range raining range validaion GBP ê EUR 6.27 H202L 4.81 H97L GBP ê USD 5.80 H172L 3.73 H45L GBP ê CHF 5.22 H190L 3.76 H92L USD ê EUR 6.05 H200L 4.28 H89L Table 2. In-sample (raining) and ou-of-sample (validaion) -saisics for one-day-ahead EWMA forecass. The forecass were compued wih he ML esimae à 0.94. In parenheses, he number of high volailiy days. 6. Conclusions Based on an algorihm for paern maching in characer srings, we implemen a paern-maching machine ha searches for occurrences of specified paerns in mulidimensional ime series. Before he search process akes place, he ime series are encoded as srings of leers from user-defined alphabes. The preprocessing of he raw daa has concepual advanages and also speeds up he maching phase decisively. Since he evaluaion of paerns is based on heir maches, an efficien maching algorihm is essenial for creaing opimal paerns by means of a geneic algorihm (GA). The GA combines paren paerns in order o breed offspring (randomly modified by muaions) ha are ever beer predicors. In an applicaion o financial ime series, we

implemen a paern-maching machine ha searches for occurrences of specified paerns in mulidimensional ime series. Before he search process akes place, he ime series are encoded as srings of leers from 208 user-defined alphabes. The preprocessing of he raw A. daa Polanski has concepual advanages and also speeds up he maching phase decisively. Since he evaluaion of paerns is based on heir maches, an efficien maching algorihm is essenial for creaing opimal paerns by means of a geneic algorihm (GA). The GA combines paren paerns in order o breed offspring (randomly modified by muaions) ha are ever beer predicors. In an applicaion o financial ime series, we show ha he presened GA has he poenial o produce paerns wih significan ou-of-sample predicive power. References [1] W. Wobcke and M. Zhang, eds., Advances in Arificial Inelligence: 21s Ausralasian Join Conference on Arificial Inelligence (AI 2008), Auckland, New Zealand, Berlin: Springer-Verlag, 2009. [2] R. Dahlhaus, J. Kurhs, P. Maass, and J. Timmer, eds., Mahemaical Mehods in Time Series Analysis and Digial Image Processing, Springer, 2008. [3] E. Sapin, O. Bailleux, and J. Chabrier, Research of Complexiy in Cellular Auomaa hrough Evoluionary Algorihms, Complex Sysems, 17(3), 2007 pp. 231 241 [4] E. Sapin and L. Bull, Evoluionary Search for Cellular Auomaa Logic Gaes wih Collision-Based Compuing, Complex Sysems, 17(4), 2008 pp. 321 338. [5] R. Sedgewick, Algorihms, Reading, MA: Addison-Wesley, 1988. [6] F. X. Diebold and G. D. Rudebusch, Scoring he Leading Indicaors, Journal of Business, 62(3), 1989 pp. 369 91. [7] G. Kaminsky, S. Lizondo, and C. Reinhar, Leading Indicaors of Currency Crises, IMF Saff Papers, 45(1), 1998 pp. 1 48. [8] B. L. Adam e al., Serum Proein Fingerprining Coupled wih a Paern-Maching Algorihm Disinguishes Prosae Cancer from Benign Prosae Hyperplasia and Healhy Men, Cancer Research, 62(13), 2002 pp. 3609 3614. [9] E. F. Pericoin e al., Serum Proeomic Paerns for Deecion of Prosae Cancer, Journal of he Naional Cancer Insiue, 94(20), 2002 pp. 1576 1578. [10] H. Iba and N. Nikolaev, Geneic Programming Polynomial Models of Financial Daa Series, in Proceedings of he 2000 Congress on Evoluionary Compuaion (CEC 00), La Jolla, CA, New York: IEEE Press, 2000 pp. 1459 1466. [11] R. Levich and L. Thomas, The Significance of Technical Trading-Rule Profis in he Foreign Exchange Marke: A Boosrap Approach, Journal of Inernaional Money and Finance, 12(5), 1993 pp. 451 474. [12] C. Neely, P. Weller, and R. Dimar, Is Technical Analysis in he Foreign Exchange Marke Profiable? A Geneic Programming Approach, Journal of Financial and Quaniaive Analysis, 32(4), 1997 pp. 405 426.

Geneic Algorihm Search for Predicive Paerns 209 [13] G. Szpiro, A Search for Hidden Relaionships: Daa Mining wih Geneic Algorihms, Compuaional Economics, 10(3), 1997 pp. 267 277. [14] M. A. H. Dempser, T. W. Payne, Y. Romahi, and G. W. P. Thompson, Compuaional Learning Techniques for Inraday FX Trading Using Popular Technical Indicaors, in IEEE Transacions on Neural Neworks, 12(4), 2001 pp. 744 754. [15] N. Packard, A Geneic Learning Algorihm for he Analysis of Complex Daa, Complex Sysems, 4(5), 1990 pp. 543 572. [16] M. Parkinson, The Exreme Value Mehod for Esimaing he Variance of he Rae of Reurn, Journal of Business, 53(1), 1980 pp. 61 65. [17] T. Bollerslev, R. Chou, and K. Kroner, ARCH Modeling in Finance: A Review of he Theory and Empirical Evidence, Journal of Economerics, 52(1 2), 1992 pp. 5 59. [18] C. Guerma and R. D. F. Harris, Forecasing Value a Risk Allowing for Time Variaion in he Variance and Kurosis of Porfolio Reurns, Inernaional Journal of Forecasing, 18(3), 2002 pp. 409 419.