TEMPORAL PATTERN IDENTIFICATION OF TIME SERIES DATA USING PATTERN WAVELETS AND GENETIC ALGORITHMS RICHARD J. POVINELLI AND XIN FENG Deparmen of Elecrical and Compuer Engineering Marquee Universiy, P.O. ox 88, Milwaukee, WI 5320-88, USA E-mail: povinellir@mu.edu Ph: 44.288.6820 Fx: 44.288.5579 ASTRACT: A new mehod for emporal paern maching of a ime series is developed using paern waveles and geneic algorihms. The paern wavele is applied o he maching of an embedded ime series. A problem-specific finess facor is inroduced in he new algorihm, which is useful o consruc a finess funcion of he feaure space. A wo-sep process discovers he paern wavele ha yields high finess value. The bes emporal paern maches are found hrough a hresholding process. These maches are kep and he fuure ime series daa poin is used in he geneic algorihm's finess funcion. The algorihm has been successfully applied o he idenificaion of saisically significan emporal paerns in financial ime series daa. Keywords: Temporal Paern Idenificaion, Geneic Algorihms, Paern Recogniion, Time Series Analysis, Waveles INTRODUCTION Daa mining is he exploraion of daa wih he goal of discovering hidden srucure. In many real-world applicaions, i is imporan o sudy he change of emporal feaures of a non-saionary ime series, and idenify he ones ha are represening he significance of ime insances. For example, i is criical in sock marke applicaions ha he paerns relaing o sudden sock price changes be idenified. Generally such ime series are considered non-saionary. Tradiional ime series analysis employs saisical mehods o model and explain he daa and predic fuure values of he ime series. I is no easy, however, o idenify he criical emporal paerns of he ime series using hese radiional mehods. Using a se of observaions, in his paper, we presen a new mehod for ime series daa mining. y inroducing a paern wavele along wih he use of a geneic algorihm (GA), emporal paerns can be effecively revealed in non-saionary ime series. The paper is organized as follows. Afer presening he problem saemen, radiional ARMA modeling is reviewed. The ideas of emporal paern maching
and he paern wavele are hen discussed. Nex, a deailed discussion of he new algorihm is provided. Finally, a presenaion of he resuls and conclusions is given. PROLEM STATEMENT Le Z {z,,, N} be he non-saionary arge ime series, whose emporal feaures evolve over ime. The ask is o find an approach o characerize hese changing emporal feaures. Applying radiional ime series modeling o his problem involves finding soluions o he ox-jenkins difference equaion (owerman and O'Connell 993). ( z ) φ δ + θ a, p q where φ p () is he nonseasonal auoregressive operaor of order p, θ q () is he nonseasonal moving average operaor of order q, z is he ime series, a is a sequence of random variables, δ is a consan erm, and is he backshif operaor. The ox- Jenkins mehod is limied by he requiremen of saionariy of he ime series and normaliy and independence of he residuals. However, in mos applicaions, hese condiions are no me. One of he mos severe drawbacks of his approach is he loss of he non-saionary characerisics we desire o idenify. Our mehod akes a new approach. Le z T ( + Q ) z,, z,,, be he se of sub-ime series of lengh Q embedded in Z, where Q N. Clearly, z Z, which may represen he changing emporal feaures or paerns of Z. We propose ha by sudying he embedding z, he emporal feaures of Z may be idenified. The mehod for eliciing he emporal feaures from he embedding z arises from a sudy of waveles and he wavele ransform. The wavele ransform is a naural exension of Fourier's work done in he early 9h cenury. Where Fourier's ransform can find frequency informaion wih no ime reference or ime informaion wih no frequency, he wavele ransform provides boh ime and frequency informaion. Generally speaking, he wavele ransform maches a compacly suppored funcion, called a wavele, across boh scale (frequency) and ranslaion (ime) (Polikar 996). The Fourier ransform maches an infiniely suppored funcion across frequency (scale). oh use convoluion of he basis funcion and he original ime series. For he wavele ransform, i is provided for all scales. Nex we inroduce he so called paern wavele and paern wavele ransform. This ransform is an exension of a discree form of he wavele ransform applied specifically o idenifying emporal feaures. PATTERN WAVELETS y relaxing he resricions of he wavele ransform, he paern wavele ransform is derived. Where he wavele ransform uses he convoluion of he wavele and he
ime series, he paern wavele ransform uses a subse of he convoluion of he paern wavele and he ime series. Also, where he wavele is required o have a zero mean, he paern wavele is no. These relaxaions yield a ransform ha idenifies he emporal feaures discussed in he problem saemen. A deailed explanaion of he algorihm follows. Le f(p,δ,z,g) be he paern wavele ransform, where p P R Q is he paern wavele, δ R is a hreshold parameer, and g g(z ) is a measure of finess of he emporal feaure. We wan o find he opimal soluion o he following problem Q max{ f( p δ Z g) p P δ },,, R, R. () p, δ The paern wavele ransform f(p,δ,z,g) is he finess of paern p wih hreshold δ applied o ime series Z wih finess measure g. The following definiions are needed for f. r pz,,,, N Q+ µ r r N Q+ 2 2 σ r ( r µ r) M + { : r µ δσ } r r The vecor z Z is he embedded series of lengh Q, where Q N. The paern facors r,,, N-Q+, are elemens of he vecor r R N-Q+ which consiss of N-Q+ inner producs of he paern wavele p and he embedded ime series z. Also µ r denoes he mean of r, σ r is he sandard deviaion of r, and M is he paern mach se, which is defined as he se of all ime insances where he paern facor r is greaer han or equal o he hreshold µ r + δσ r. Finally, he paern wavele ransform f is defined as he mean of g(z ) for M. f ( p,, Z, g) δ µ M cm M gz (2) where c(m) is he cardinaliy of M. Also σ M is he sandard deviaion of g(z ) a imes M. 2 M ( gz M ) M σ µ cm I should be noed ha he selecion of finess operaor g in (2) is problem specific and is independen of he algorihm. I should be chosen a priori based on he ypes of hidden emporal feaures o be discovered.
ecause he maximizaion problem in () is complex and nonlinear, i is difficul o solve using radiional numerical opimizaion mehods. To overcome hese limiaions, a roulee wheel based GA wih eliism (Goldberg 989) searches for he opimal p and δ. Ideally p R Q and δ R, for efficiency purposes p [-ε, ε] Q and δ [δ, δ 2 ]. These ranges are discree due o he naure of he GA wih a possible 2 b unique values, where b is he number of bis used o represen p i and δ. The parameers for he GA are Q, Z, g, b, and he populaion size. The parameer b is usually in he range of 4 o 6 and he populaion size is se o 30. The mos elie individual is mainained from generaion o generaion wihou change. No muaion is used. The GA is shown below. Paern Finding Geneic Algorihm. Creae an elie populaion a) Randomly generae large populaion (0 imes normal populaion size) b) Calculae finess c) Selec he op 0h of he populaion o coninue 2. While all finess have no converged a) Perform roulee selecion, save elie individual b) Crossover populaion C)Calculae finess APPLICATION RESULTS The goal of his applicaion is o find hidden emporal paerns in a cerain sock ime series. Our experimenal ime series is he daily open sock price of he Quanum (QNTM, raded on he NASDAQ) ime series Z {z,,, N} wih N3,76. See Figure for illusraion. Obviously, his ime series is non-saionary. Our special ineres is o idenify he emporal paern ha is relaed o a significan price change. ARMA Model Two ARMA models of he ime series reveal essenially he same random walk characerisics. The models are Figure - Quanum Corp sock ime series
z φz + ε (3) + φ z z + ε φ z 2 (4) z z + ε (5) where φ 0.99933 in (3) and φ 0.045948 in (4). The φ in boh models is saisically significan, bu he auocorrelaions of (3) show srong evidence of nonsaionariy and he Ljung-ox es of he residuals indicaes a lack of independence. The model (4) Ljung-ox es of he residuals indicaes independence. y seeing ha he φ in (3) and φ 0 in (4), boh models become equivalen (5). The ARMA models provide lile insigh ino hidden srucure in he ime series; he series is a random walk. On he oher hand he mehod presened by he auhors finds saisically significan srucure as presened below. Paern Wavele Model In building he paern wavele model, he finess operaor g in (2) is chosen as gz ( Q ) + z. In our case we wan o find feaures ha indicae a fi % afer he end of he paern mach. We found c(m) o be beween 38 and 34, depending on he suppor of he paern wavele. The saisics for eigh paerns are given in Table. The change in he sock price afer a paern mach was beween +0.7% and +.5%, whereas he average change was +0.2%. This shows ha here is a correlaion beween he paerns and he price changes. The sandard deviaion, hough, is beween 3% and 4% for he paerns and 3% for he average day. The µ M of he mached paerns is beween 5 o 2 higher han µ g(z) of he whole ime series. Two saisical ess are used o show significance of he resuls. The firs es is he runs es. The es hypohesis is H 0 : There is no difference beween he mached ime series and he remaining ime series. H A : There is significan difference beween he mached ime series and he remaining ime series. Our es uses a % probabiliy of Type I error (α 0.0). Table shows ha he null hypohesis can easily be rejeced in all cases. The second saisical es is he difference of wo independen means. The wo populaions are he ransformed series and he whole ime series. Alhough he wo populaions are probably dependen, his can be ignored because i makes he saisics more conservaive, i.e., i will end o overesimae he Type I error. The es hypohesis is H 0 : µ M - µ g(z) 0, H A : µ M - µ g(z) > 0. This es uses a % probabiliy of Type I error (α 0.0). Again, Table shows ha he null hypohesis can be very confidenly rejeced for all he paerns. The mean finess of he ime series µ g(z) 0.0079, and he σ g(z) 0.03293.
TALE STATISTICAL SIGNIFICANCE OF RESULTS Q c(m) µ M σ M Runs es α means es α 238 0.00736 0.0385 <.00x0-7 8.8x0-3 2 67 0.00834 0.0375 <.00x0-7 7.58x0-3 3 357 0.00746 0.0336 <.00x0-7 3.64x0-4 4 85 0.0093 0.047 4.78x0-0 5.30x0-3 9 20 0.0057 0.046 <.00x0-7 8.28x0-4 2 44 0.0397 0.0362 <.00x0-7.5x0-5 27 90 0.0276 0.0406 4.44x0-6 5.55x0-5 39 20 0.03 0.0348 <.00x0-7 2.56x0-5 CONCLUSIONS In his paper, a new mehod for emporal daa mining is proposed. Using a paern wavele ransform as a daa mining ool has yielded meaningful resuls. Insead of forcing he wavele o mach everywhere, i maches only when here is a high similariy beween he paern wavele and he underlying ime series. To find such paern waveles, a geneic algorihm is used. Even wih a complex, non-saionary ime series like sock price, he algorihm deeced ineresing paerns. Across all esed Q he paerns found were saisically significan. The algorihm is flexible in ha by using an alernaive g, finess funcion, differen srucures can be found. The g used in his research was for posiive changes, bu jus as easily gz ( Q ) + z which would find negaive changes. Also, a more complicaed g could be used ha could ake ino accoun he sandard deviaions of he maches. Fuure research direcions will include exploring combinaions of paerns, looking for paerns in shorer segmens of he ime series, and adding addiional facor dimensions such as volume. REFERENCES owerman,. L., and O'Connell, R. T. (993). Forecasing and Time Series: An Applied Approach, Duxbury Press, elmon, California. Ghoshray, S. (996). Hybrid predicion echnique by fuzzy inferencing on he chaoic naure of ime series daa. Arificial Neural Neworks in Engineering, Proceedings, 725-730. Goldberg, D. E. (989). Geneic algorihms in search, opimizaion, and machine learning, Addison- Wesley Pub. Co., Reading, Mass. Lin, C. T., and Lee, C. S. G. (996). Neural Fuzzy Sysems - A Neuro-Fuzzy Synergism o Inelligen Sysems, Prenice-Hall, Upper Saddle River, NJ. Polikar, R. (996). The Engineer's Ulimae Guide To Wavele Analysis - The Wavele Tuorial.. Weigend, A. S., and Gershenfeld, N. A. (994). Time Series Predicion: Forecasing he Fuure and Undersanding he Pas., Addison-Wesley Pub. Co., Reading, MA.