TEMPORAL PATTERN IDENTIFICATION OF TIME SERIES DATA USING PATTERN WAVELETS AND GENETIC ALGORITHMS



Similar documents
TIME SERIES DATA MINING: IDENTIFYING TEMPORAL PATTERNS FOR CHARACTERIZATION AND PREDICTION OF TIME SERIES EVENTS

Chapter 8: Regression with Lagged Explanatory Variables

USE OF EDUCATION TECHNOLOGY IN ENGLISH CLASSES

SPEC model selection algorithm for ARCH models: an options pricing evaluation framework

Time Series Analysis Using SAS R Part I The Augmented Dickey-Fuller (ADF) Test

Journal Of Business & Economics Research September 2005 Volume 3, Number 9

Measuring macroeconomic volatility Applications to export revenue data,

DOES TRADING VOLUME INFLUENCE GARCH EFFECTS? SOME EVIDENCE FROM THE GREEK MARKET WITH SPECIAL REFERENCE TO BANKING SECTOR

Stock Price Prediction Using the ARIMA Model

Principal components of stock market dynamics. Methodology and applications in brief (to be updated ) Andrei Bouzaev, bouzaev@ya.

Cointegration: The Engle and Granger approach

Why Did the Demand for Cash Decrease Recently in Korea?

DYNAMIC MODELS FOR VALUATION OF WRONGFUL DEATH PAYMENTS

Diane K. Michelson, SAS Institute Inc, Cary, NC Annie Dudley Zangi, SAS Institute Inc, Cary, NC

Improving Technical Trading Systems By Using A New MATLAB based Genetic Algorithm Procedure

Individual Health Insurance April 30, 2008 Pages

Vector Autoregressions (VARs): Operational Perspectives

The naive method discussed in Lecture 1 uses the most recent observations to forecast future values. That is, Y ˆ t + 1

A Note on Using the Svensson procedure to estimate the risk free rate in corporate valuation

A New Type of Combination Forecasting Method Based on PLS

ANALYSIS AND COMPARISONS OF SOME SOLUTION CONCEPTS FOR STOCHASTIC PROGRAMMING PROBLEMS

Genetic Algorithm Search for Predictive Patterns in Multidimensional Time Series

SURVEYING THE RELATIONSHIP BETWEEN STOCK MARKET MAKER AND LIQUIDITY IN TEHRAN STOCK EXCHANGE COMPANIES

Usefulness of the Forward Curve in Forecasting Oil Prices

The Transport Equation

Multiprocessor Systems-on-Chips

µ r of the ferrite amounts to It should be noted that the magnetic length of the + δ

The Relationship between Stock Return Volatility and. Trading Volume: The case of The Philippines*

Gene Regulatory Network Discovery from Time-Series Gene Expression Data A Computational Intelligence Approach

Improvement in Forecasting Accuracy Using the Hybrid Model of ARFIMA and Feed Forward Neural Network

Mathematics in Pharmacokinetics What and Why (A second attempt to make it clearer)

MACROECONOMIC FORECASTS AT THE MOF A LOOK INTO THE REAR VIEW MIRROR

Automatic measurement and detection of GSM interferences

Predicting Stock Market Index Trading Signals Using Neural Networks

PROFIT TEST MODELLING IN LIFE ASSURANCE USING SPREADSHEETS PART ONE

Chapter 7. Response of First-Order RL and RC Circuits

Forecasting Model for Crude Oil Price Using Artificial Neural Networks and Commodity Futures Prices

Term Structure of Prices of Asian Options

Hotel Room Demand Forecasting via Observed Reservation Information

Feasibility of Quantum Genetic Algorithm in Optimizing Construction Scheduling

NEURAL NETWORKS APPLIED TO STOCK MARKET FORECASTING: AN EMPIRICAL ANALYSIS

Distributing Human Resources among Software Development Projects 1

A Re-examination of the Joint Mortality Functions

GOOD NEWS, BAD NEWS AND GARCH EFFECTS IN STOCK RETURN DATA

DDoS Attacks Detection Model and its Application

Inductance and Transient Circuits

The Greek financial crisis: growing imbalances and sovereign spreads. Heather D. Gibson, Stephan G. Hall and George S. Tavlas

A Probability Density Function for Google s stocks

FORECASTING NETWORK TRAFFIC: A COMPARISON OF NEURAL NETWORKS AND LINEAR MODELS

Appendix D Flexibility Factor/Margin of Choice Desktop Research

Task is a schedulable entity, i.e., a thread

UNDERSTANDING THE DEATH BENEFIT SWITCH OPTION IN UNIVERSAL LIFE POLICIES. Nadine Gatzert

Hedging with Forwards and Futures

Chapter 4: Exponential and Logarithmic Functions

GoRA. For more information on genetics and on Rheumatoid Arthritis: Genetics of Rheumatoid Arthritis. Published work referred to in the results:

CAUSAL RELATIONSHIP BETWEEN STOCK MARKET AND EXCHANGE RATE, FOREIGN EXCHANGE RESERVES AND VALUE OF TRADE BALANCE: A CASE STUDY FOR INDIA

Modelling and Forecasting Volatility of Gold Price with Other Precious Metals Prices by Univariate GARCH Models

Supplementary Appendix for Depression Babies: Do Macroeconomic Experiences Affect Risk-Taking?

LIFE INSURANCE WITH STOCHASTIC INTEREST RATE. L. Noviyanti a, M. Syamsuddin b

Small and Large Trades Around Earnings Announcements: Does Trading Behavior Explain Post-Earnings-Announcement Drift?

Internet Engineering. Jacek Mazurkiewicz, PhD Softcomputing. Part 1: Introduction, Elementary ANNs

ARCH Proceedings

Lead Lag Relationships between Futures and Spot Prices

Stock Trading with Recurrent Reinforcement Learning (RRL) CS229 Application Project Gabriel Molina, SUID

Making a Faster Cryptanalytic Time-Memory Trade-Off

The Real Business Cycle paradigm. The RBC model emphasizes supply (technology) disturbances as the main source of

Trends in TCP/IP Retransmissions and Resets

Dynamic programming models and algorithms for the mutual fund cash balance problem

SELF-EVALUATION FOR VIDEO TRACKING SYSTEMS

Day Trading Index Research - He Ingeria and Sock Marke

A Bayesian Approach for Personalized Booth Recommendation

An Optimal Strategy of Natural Hedging for. a General Portfolio of Insurance Companies

Keldysh Formalism: Non-equilibrium Green s Function

A Natural Feature-Based 3D Object Tracking Method for Wearable Augmented Reality

On the degrees of irreducible factors of higher order Bernoulli polynomials

Modeling Tourist Arrivals Using Time Series Analysis: Evidence From Australia

Market Liquidity and the Impacts of the Computerized Trading System: Evidence from the Stock Exchange of Thailand

INTRODUCTION TO FORECASTING

Random Walk in 1-D. 3 possible paths x vs n. -5 For our random walk, we assume the probabilities p,q do not depend on time (n) - stationary

Visualization Foundations IDV 2015/2016

Stochastic Optimal Control Problem for Life Insurance

A New Adaptive Ensemble Boosting Classifier for Concept Drifting Stream Data

THE EFFECT OF CORPORATE GOVERNANCE FACTORS ON CASH FLOW RESULTING FROM OPERATING ACTIVITIES AND FIRM FINANCING

DESIGN A NEURAL NETWORK FOR TIME SERIES FINANCIAL FORECASTING: ACCURACY AND ROBUSTNESS ANALISYS

Idealistic characteristics of Islamic Azad University masters - Islamshahr Branch from Students Perspective

Transcription:

TEMPORAL PATTERN IDENTIFICATION OF TIME SERIES DATA USING PATTERN WAVELETS AND GENETIC ALGORITHMS RICHARD J. POVINELLI AND XIN FENG Deparmen of Elecrical and Compuer Engineering Marquee Universiy, P.O. ox 88, Milwaukee, WI 5320-88, USA E-mail: povinellir@mu.edu Ph: 44.288.6820 Fx: 44.288.5579 ASTRACT: A new mehod for emporal paern maching of a ime series is developed using paern waveles and geneic algorihms. The paern wavele is applied o he maching of an embedded ime series. A problem-specific finess facor is inroduced in he new algorihm, which is useful o consruc a finess funcion of he feaure space. A wo-sep process discovers he paern wavele ha yields high finess value. The bes emporal paern maches are found hrough a hresholding process. These maches are kep and he fuure ime series daa poin is used in he geneic algorihm's finess funcion. The algorihm has been successfully applied o he idenificaion of saisically significan emporal paerns in financial ime series daa. Keywords: Temporal Paern Idenificaion, Geneic Algorihms, Paern Recogniion, Time Series Analysis, Waveles INTRODUCTION Daa mining is he exploraion of daa wih he goal of discovering hidden srucure. In many real-world applicaions, i is imporan o sudy he change of emporal feaures of a non-saionary ime series, and idenify he ones ha are represening he significance of ime insances. For example, i is criical in sock marke applicaions ha he paerns relaing o sudden sock price changes be idenified. Generally such ime series are considered non-saionary. Tradiional ime series analysis employs saisical mehods o model and explain he daa and predic fuure values of he ime series. I is no easy, however, o idenify he criical emporal paerns of he ime series using hese radiional mehods. Using a se of observaions, in his paper, we presen a new mehod for ime series daa mining. y inroducing a paern wavele along wih he use of a geneic algorihm (GA), emporal paerns can be effecively revealed in non-saionary ime series. The paper is organized as follows. Afer presening he problem saemen, radiional ARMA modeling is reviewed. The ideas of emporal paern maching

and he paern wavele are hen discussed. Nex, a deailed discussion of he new algorihm is provided. Finally, a presenaion of he resuls and conclusions is given. PROLEM STATEMENT Le Z {z,,, N} be he non-saionary arge ime series, whose emporal feaures evolve over ime. The ask is o find an approach o characerize hese changing emporal feaures. Applying radiional ime series modeling o his problem involves finding soluions o he ox-jenkins difference equaion (owerman and O'Connell 993). ( z ) φ δ + θ a, p q where φ p () is he nonseasonal auoregressive operaor of order p, θ q () is he nonseasonal moving average operaor of order q, z is he ime series, a is a sequence of random variables, δ is a consan erm, and is he backshif operaor. The ox- Jenkins mehod is limied by he requiremen of saionariy of he ime series and normaliy and independence of he residuals. However, in mos applicaions, hese condiions are no me. One of he mos severe drawbacks of his approach is he loss of he non-saionary characerisics we desire o idenify. Our mehod akes a new approach. Le z T ( + Q ) z,, z,,, be he se of sub-ime series of lengh Q embedded in Z, where Q N. Clearly, z Z, which may represen he changing emporal feaures or paerns of Z. We propose ha by sudying he embedding z, he emporal feaures of Z may be idenified. The mehod for eliciing he emporal feaures from he embedding z arises from a sudy of waveles and he wavele ransform. The wavele ransform is a naural exension of Fourier's work done in he early 9h cenury. Where Fourier's ransform can find frequency informaion wih no ime reference or ime informaion wih no frequency, he wavele ransform provides boh ime and frequency informaion. Generally speaking, he wavele ransform maches a compacly suppored funcion, called a wavele, across boh scale (frequency) and ranslaion (ime) (Polikar 996). The Fourier ransform maches an infiniely suppored funcion across frequency (scale). oh use convoluion of he basis funcion and he original ime series. For he wavele ransform, i is provided for all scales. Nex we inroduce he so called paern wavele and paern wavele ransform. This ransform is an exension of a discree form of he wavele ransform applied specifically o idenifying emporal feaures. PATTERN WAVELETS y relaxing he resricions of he wavele ransform, he paern wavele ransform is derived. Where he wavele ransform uses he convoluion of he wavele and he

ime series, he paern wavele ransform uses a subse of he convoluion of he paern wavele and he ime series. Also, where he wavele is required o have a zero mean, he paern wavele is no. These relaxaions yield a ransform ha idenifies he emporal feaures discussed in he problem saemen. A deailed explanaion of he algorihm follows. Le f(p,δ,z,g) be he paern wavele ransform, where p P R Q is he paern wavele, δ R is a hreshold parameer, and g g(z ) is a measure of finess of he emporal feaure. We wan o find he opimal soluion o he following problem Q max{ f( p δ Z g) p P δ },,, R, R. () p, δ The paern wavele ransform f(p,δ,z,g) is he finess of paern p wih hreshold δ applied o ime series Z wih finess measure g. The following definiions are needed for f. r pz,,,, N Q+ µ r r N Q+ 2 2 σ r ( r µ r) M + { : r µ δσ } r r The vecor z Z is he embedded series of lengh Q, where Q N. The paern facors r,,, N-Q+, are elemens of he vecor r R N-Q+ which consiss of N-Q+ inner producs of he paern wavele p and he embedded ime series z. Also µ r denoes he mean of r, σ r is he sandard deviaion of r, and M is he paern mach se, which is defined as he se of all ime insances where he paern facor r is greaer han or equal o he hreshold µ r + δσ r. Finally, he paern wavele ransform f is defined as he mean of g(z ) for M. f ( p,, Z, g) δ µ M cm M gz (2) where c(m) is he cardinaliy of M. Also σ M is he sandard deviaion of g(z ) a imes M. 2 M ( gz M ) M σ µ cm I should be noed ha he selecion of finess operaor g in (2) is problem specific and is independen of he algorihm. I should be chosen a priori based on he ypes of hidden emporal feaures o be discovered.

ecause he maximizaion problem in () is complex and nonlinear, i is difficul o solve using radiional numerical opimizaion mehods. To overcome hese limiaions, a roulee wheel based GA wih eliism (Goldberg 989) searches for he opimal p and δ. Ideally p R Q and δ R, for efficiency purposes p [-ε, ε] Q and δ [δ, δ 2 ]. These ranges are discree due o he naure of he GA wih a possible 2 b unique values, where b is he number of bis used o represen p i and δ. The parameers for he GA are Q, Z, g, b, and he populaion size. The parameer b is usually in he range of 4 o 6 and he populaion size is se o 30. The mos elie individual is mainained from generaion o generaion wihou change. No muaion is used. The GA is shown below. Paern Finding Geneic Algorihm. Creae an elie populaion a) Randomly generae large populaion (0 imes normal populaion size) b) Calculae finess c) Selec he op 0h of he populaion o coninue 2. While all finess have no converged a) Perform roulee selecion, save elie individual b) Crossover populaion C)Calculae finess APPLICATION RESULTS The goal of his applicaion is o find hidden emporal paerns in a cerain sock ime series. Our experimenal ime series is he daily open sock price of he Quanum (QNTM, raded on he NASDAQ) ime series Z {z,,, N} wih N3,76. See Figure for illusraion. Obviously, his ime series is non-saionary. Our special ineres is o idenify he emporal paern ha is relaed o a significan price change. ARMA Model Two ARMA models of he ime series reveal essenially he same random walk characerisics. The models are Figure - Quanum Corp sock ime series

z φz + ε (3) + φ z z + ε φ z 2 (4) z z + ε (5) where φ 0.99933 in (3) and φ 0.045948 in (4). The φ in boh models is saisically significan, bu he auocorrelaions of (3) show srong evidence of nonsaionariy and he Ljung-ox es of he residuals indicaes a lack of independence. The model (4) Ljung-ox es of he residuals indicaes independence. y seeing ha he φ in (3) and φ 0 in (4), boh models become equivalen (5). The ARMA models provide lile insigh ino hidden srucure in he ime series; he series is a random walk. On he oher hand he mehod presened by he auhors finds saisically significan srucure as presened below. Paern Wavele Model In building he paern wavele model, he finess operaor g in (2) is chosen as gz ( Q ) + z. In our case we wan o find feaures ha indicae a fi % afer he end of he paern mach. We found c(m) o be beween 38 and 34, depending on he suppor of he paern wavele. The saisics for eigh paerns are given in Table. The change in he sock price afer a paern mach was beween +0.7% and +.5%, whereas he average change was +0.2%. This shows ha here is a correlaion beween he paerns and he price changes. The sandard deviaion, hough, is beween 3% and 4% for he paerns and 3% for he average day. The µ M of he mached paerns is beween 5 o 2 higher han µ g(z) of he whole ime series. Two saisical ess are used o show significance of he resuls. The firs es is he runs es. The es hypohesis is H 0 : There is no difference beween he mached ime series and he remaining ime series. H A : There is significan difference beween he mached ime series and he remaining ime series. Our es uses a % probabiliy of Type I error (α 0.0). Table shows ha he null hypohesis can easily be rejeced in all cases. The second saisical es is he difference of wo independen means. The wo populaions are he ransformed series and he whole ime series. Alhough he wo populaions are probably dependen, his can be ignored because i makes he saisics more conservaive, i.e., i will end o overesimae he Type I error. The es hypohesis is H 0 : µ M - µ g(z) 0, H A : µ M - µ g(z) > 0. This es uses a % probabiliy of Type I error (α 0.0). Again, Table shows ha he null hypohesis can be very confidenly rejeced for all he paerns. The mean finess of he ime series µ g(z) 0.0079, and he σ g(z) 0.03293.

TALE STATISTICAL SIGNIFICANCE OF RESULTS Q c(m) µ M σ M Runs es α means es α 238 0.00736 0.0385 <.00x0-7 8.8x0-3 2 67 0.00834 0.0375 <.00x0-7 7.58x0-3 3 357 0.00746 0.0336 <.00x0-7 3.64x0-4 4 85 0.0093 0.047 4.78x0-0 5.30x0-3 9 20 0.0057 0.046 <.00x0-7 8.28x0-4 2 44 0.0397 0.0362 <.00x0-7.5x0-5 27 90 0.0276 0.0406 4.44x0-6 5.55x0-5 39 20 0.03 0.0348 <.00x0-7 2.56x0-5 CONCLUSIONS In his paper, a new mehod for emporal daa mining is proposed. Using a paern wavele ransform as a daa mining ool has yielded meaningful resuls. Insead of forcing he wavele o mach everywhere, i maches only when here is a high similariy beween he paern wavele and he underlying ime series. To find such paern waveles, a geneic algorihm is used. Even wih a complex, non-saionary ime series like sock price, he algorihm deeced ineresing paerns. Across all esed Q he paerns found were saisically significan. The algorihm is flexible in ha by using an alernaive g, finess funcion, differen srucures can be found. The g used in his research was for posiive changes, bu jus as easily gz ( Q ) + z which would find negaive changes. Also, a more complicaed g could be used ha could ake ino accoun he sandard deviaions of he maches. Fuure research direcions will include exploring combinaions of paerns, looking for paerns in shorer segmens of he ime series, and adding addiional facor dimensions such as volume. REFERENCES owerman,. L., and O'Connell, R. T. (993). Forecasing and Time Series: An Applied Approach, Duxbury Press, elmon, California. Ghoshray, S. (996). Hybrid predicion echnique by fuzzy inferencing on he chaoic naure of ime series daa. Arificial Neural Neworks in Engineering, Proceedings, 725-730. Goldberg, D. E. (989). Geneic algorihms in search, opimizaion, and machine learning, Addison- Wesley Pub. Co., Reading, Mass. Lin, C. T., and Lee, C. S. G. (996). Neural Fuzzy Sysems - A Neuro-Fuzzy Synergism o Inelligen Sysems, Prenice-Hall, Upper Saddle River, NJ. Polikar, R. (996). The Engineer's Ulimae Guide To Wavele Analysis - The Wavele Tuorial.. Weigend, A. S., and Gershenfeld, N. A. (994). Time Series Predicion: Forecasing he Fuure and Undersanding he Pas., Addison-Wesley Pub. Co., Reading, MA.