Locally Adaptive Dimensionality Reduction for Indexing Large Time Series Databases

Similar documents

Preprocess a planar map S. Given a query point p, report the face of S containing p. Goal: O(n)-size data structure that enables O(log n) query time.

6.7 Network analysis Introduction. References - Network analysis. Topological analysis

APPENDIX III THE ENVELOPE PROPERTY

Chapter Eight. f : R R

Optimal multi-degree reduction of Bézier curves with constraints of endpoints continuity

1. The Time Value of Money

STATISTICAL PROPERTIES OF LEAST SQUARES ESTIMATORS. x, where. = y - ˆ " 1

The analysis of annuities relies on the formula for geometric sums: r k = rn+1 1 r 1. (2.1) k=0

Abraham Zaks. Technion I.I.T. Haifa ISRAEL. and. University of Haifa, Haifa ISRAEL. Abstract

IDENTIFICATION OF THE DYNAMICS OF THE GOOGLE S RANKING ALGORITHM. A. Khaki Sedigh, Mehdi Roudaki

ANOVA Notes Page 1. Analysis of Variance for a One-Way Classification of Data

Fractal-Structured Karatsuba`s Algorithm for Binary Field Multiplication: FK

Numerical Methods with MS Excel

The Digital Signature Scheme MQQ-SIG

Average Price Ratios

Applications of Support Vector Machine Based on Boolean Kernel to Spam Filtering

SHAPIRO-WILK TEST FOR NORMALITY WITH KNOWN MEAN

DECISION MAKING WITH THE OWA OPERATOR IN SPORT MANAGEMENT

Chapter 3. AMORTIZATION OF LOAN. SINKING FUNDS R =

On Error Detection with Block Codes

CHAPTER 2. Time Value of Money 6-1

Speeding up k-means Clustering by Bootstrap Averaging

The Gompertz-Makeham distribution. Fredrik Norström. Supervisor: Yuri Belyaev

Dynamic Two-phase Truncated Rayleigh Model for Release Date Prediction of Software

T = 1/freq, T = 2/freq, T = i/freq, T = n (number of cash flows = freq n) are :

Group Nearest Neighbor Queries

A New Bayesian Network Method for Computing Bottom Event's Structural Importance Degree using Jointree

Maintenance Scheduling of Distribution System with Optimal Economy and Reliability

ADAPTATION OF SHAPIRO-WILK TEST TO THE CASE OF KNOWN MEAN

Green Master based on MapReduce Cluster

Three Dimensional Interpolation of Video Signals

ECONOMIC CHOICE OF OPTIMUM FEEDER CABLE CONSIDERING RISK ANALYSIS. University of Brasilia (UnB) and The Brazilian Regulatory Agency (ANEEL), Brazil

Statistical Pattern Recognition (CE-725) Department of Computer Engineering Sharif University of Technology

A Study of Unrelated Parallel-Machine Scheduling with Deteriorating Maintenance Activities to Minimize the Total Completion Time

of the relationship between time and the value of money.

Load Balancing Control for Parallel Systems

RUSSIAN ROULETTE AND PARTICLE SPLITTING

Bayesian Network Representation

Integrating Production Scheduling and Maintenance: Practical Implications

Fast, Secure Encryption for Indexing in a Column-Oriented DBMS

10.5 Future Value and Present Value of a General Annuity Due

ANALYTICAL MODEL FOR TCP FILE TRANSFERS OVER UMTS. Janne Peisa Ericsson Research Jorvas, Finland. Michael Meyer Ericsson Research, Germany

Constrained Cubic Spline Interpolation for Chemical Engineering Applications

Efficient Traceback of DoS Attacks using Small Worlds in MANET

n. We know that the sum of squares of p independent standard normal variables has a chi square distribution with p degrees of freedom.

Optimal replacement and overhaul decisions with imperfect maintenance and warranty contracts

Security Analysis of RAPP: An RFID Authentication Protocol based on Permutation

Robust Realtime Face Recognition And Tracking System

Statistical Intrusion Detector with Instance-Based Learning

Common p-belief: The General Case

Compressive Sensing over Strongly Connected Digraph and Its Application in Traffic Monitoring

Capacitated Production Planning and Inventory Control when Demand is Unpredictable for Most Items: The No B/C Strategy

How To Value An Annuity

Finito: A Faster, Permutable Incremental Gradient Method for Big Data Problems

An IG-RS-SVM classifier for analyzing reviews of E-commerce product

Plastic Number: Construction and Applications

Chapter = 3000 ( ( 1 ) Present Value of an Annuity. Section 4 Present Value of an Annuity; Amortization

The impact of service-oriented architecture on the scheduling algorithm in cloud computing

where p is the centroid of the neighbors of p. Consider the eigenvector problem

IP Network Topology Link Prediction Based on Improved Local Information Similarity Algorithm

Impact of Interference on the GPRS Multislot Link Level Performance

Relaxation Methods for Iterative Solution to Linear Systems of Equations

Credibility Premium Calculation in Motor Third-Party Liability Insurance

Performance Attribution. Methodology Overview

An Approach to Evaluating the Computer Network Security with Hesitant Fuzzy Information

Reinsurance and the distribution of term insurance claims

A particle swarm optimization to vehicle routing problem with fuzzy demands

The Time Value of Money

Classic Problems at a Glance using the TVM Solver

Lecture 7. Norms and Condition Numbers

Banking (Early Repayment of Housing Loans) Order,

Settlement Prediction by Spatial-temporal Random Process

Near Neighbor Distribution in Sets of Fractal Nature

Optimal Packetization Interval for VoIP Applications Over IEEE Networks

Projection model for Computer Network Security Evaluation with interval-valued intuitionistic fuzzy information. Qingxiang Li

CH. V ME256 STATICS Center of Gravity, Centroid, and Moment of Inertia CENTER OF GRAVITY AND CENTROID

Simple Linear Regression

Automated Event Registration System in Corporation

Polyphase Filters. Section 12.4 Porat 1/39

STOCHASTIC approximation algorithms have several

MDM 4U PRACTICE EXAMINATION

Using Phase Swapping to Solve Load Phase Balancing by ADSCHNN in LV Distribution Network

Fault Tree Analysis of Software Reliability Allocation

The Application of Intuitionistic Fuzzy Set TOPSIS Method in Employee Performance Appraisal

M. Salahi, F. Mehrdoust, F. Piri. CVaR Robust Mean-CVaR Portfolio Optimization

Transcription:

Locally Adaptve Dmesoalty educto for Idexg Large Tme Seres Databases Kaushk Chakrabart Eamo Keogh Sharad Mehrotra Mchael Pazza Mcrosoft esearch Uv. of Calfora Uv. of Calfora Uv. of Calfora edmod, WA 985 versde, CA 95 Irve, CA 9697 Irve, CA 9697 kaushk@mcrosoft.com eamo@cs.ucr.edu sharad@cs.uc.edu pazza@cs.uc.edu Abstract Smlarty search large tme seres databases has attracted much research terest recetly. It s a dffcult problem because of the typcally hgh dmesoalty of the data.. The most promsg solutos volve performg dmesoalty reducto o the data, the dexg the reduced data wth a multdmesoal dex structure. May dmesoalty reducto techques have bee proposed, cludg Sgular Value Decomposto (SVD), the Dscrete Fourer trasform (DFT), ad the Dscrete Wavelet Trasform (DWT). I ths work we troduce a ew dmesoalty reducto techque whch we call Adaptve Pecewse Costat Approxmato (APCA). Whle prevous techques (e.g., SVD, DFT ad DWT) choose a commo represetato for all the tems the database that mmzes the global recostructo error, APCA approxmates each tme seres by a set of costat value segmets of varyg legths such that ther dvdual recostructo errors are mmal. We show how APCA ca be dexed usg a multdmesoal dex structure. We propose two dstace measures the dexed space that explot the hgh fdelty of APCA for fast searchg: a lower boudg Eucldea dstace approxmato, ad a o-lower boudg, but very tght Eucldea dstace approxmato ad show how they ca support fast exact searchg, ad eve faster approxmate searchg o the same dex structure. We theoretcally ad emprcally compare APCA to all the other techques ad demostrate ts superorty. Categores ad Subject Descrptors: H.3.3 [Iformato Search ad etreval] Search Process H..4 [Systems] Multmeda databases. Keywords: Idexg, dmesoalty reducto, tme-seres smlarty retreval.. Itroducto Tme seres accout for a large proporto of the data stored facal, medcal ad scetfc databases. ecetly there has bee much terest the problem of smlarty search (query-bycotet) tme seres databases. Smlarty search s useful ts ow rght as a tool for exploratory data aalyss, ad t s also a mportat elemet of may data mg applcatos such as clusterg [3], classfcato [6, 33] ad mg of assocato rules []. The smlarty betwee two tme seres s typcally measured wth Eucldea dstace, whch ca be calculated very effcetly. However the volume of data typcally ecoutered exasperates the problem. Mult-ggabyte datasets are very commo. As typcal example, cosder the MACHCO project. Ths database cotas more tha a terabyte of data ad s updated at the rate of several ggabytes a day [48]. CONTACT AUTHO. [Address: Oe Mcrosoft Way, edmod, WA 985-83, USA. Phoe: (45)73-537 Fax: (45) 936-739] Work doe whle author was Ph.D. studet at Uversty of Illos at Urbaa Champag Work doe whle author was Ph.D. studet at Uversty of Calfora at Irve

The most promsg smlarty search methods are techques that perform dmesoalty reducto o the data, the use a multdmesoal dex structure to dex the data the trasformed space. The techque was troduced [] ad exteded [39, 3,]. The orgal work by Agrawal et. al. utlzes the Dscrete Fourer Trasform (DFT) to perform the dmesoalty reducto, but other techques have bee suggested, cludg Sgular Value Decomposto (SVD) [8, 4, 3], the Dscrete Wavelet Trasform (DWT) [9, 49, ] ad Pecewse Aggregate Approxmato (PAA) [4, 5]. For a gve dex structure, the effcecy of dexg depeds oly o the fdelty of the approxmato the reduced dmesoalty space. However, choosg a dmesoalty reducto techque, we caot smply choose a arbtrary compresso algorthm. What s requred s a techque that produces a dexable represetato. For example, may tme seres ca be effcetly compressed by delta ecodg, but ths represetato does ot led tself to dexg. I cotrast SVD, DFT, DWT ad PAA all led themselves aturally to dexg, wth each egewave, fourer coeffcet, wavelet coeffcet or aggregate segmet mappg oto oe dmeso of a dex tree. The ma cotrbuto of ths paper s to propose a smple, but hghly effectve compresso techque, Adaptve Pecewse Costat Approxmato (APCA), ad show that t ca be dexed usg a multdmesoal dex structure. Ths represetato was cosdered by other researchers, but they suggested t does ot allow for dexg due to ts rregularty [5]. We wll show that dexg APCA s possble, ad, usg APCA s up to oe to two orders of magtude more effcet tha alteratve techques o real world datasets. We wll defe the APCA represetato detal Secto 3, however a tutve uderstadg ca be gleaed from Fgure. APCA epresetato ecostructo Error 4.6 DFT ecostructo Error 5.85 Haar Wavelet ecostructo Error 5.77 SVD ecostructo Error 5. 3 4 5 3 4 5 Fgure : A vsual comparso of the tme seres represetato proposed ths work (APCA), ad the 3 other represetatos advocated the lterature. For far comparso, all represetatos have the same compresso rato. The recostructo error s the Eucldea dstace betwee the orgal tme seres ad ts approxmato. There are may stuatos whch a user would be wllg to sacrfce some accuracy for sgfcat speedup [5]. Wth ths md we troduce two dstace measures defed o the APCA represetato. The frst tghtly lower bouds the Eucldea dstace metrc ad s used to produce exact earest eghbors. The secod s ot lower boudg, but produces a very close approxmato of Eucldea dstace ad ca be used to quckly fd approxmate earest eghbors. Both methods ca be supported by the same dex structure so that a user ca swtch betwee fast exact search ad eve faster approxmate search. Addtoally we wll show that the APCA represetato ca support queres where the dstace measure s a arbtrary L p orm (.e. p =,,, ). The rest of the paper s orgazed as follows. I Secto we provde backgroud o ad revew related work tme seres smlarty search. I Secto 3 we troduce the APCA represetato, a algorthm to compute t effcetly ad two dstace measures defed o t. I Secto 4 we demostrate how to dex the APCA represetato. Secto 5 cotas a comprehesve expermetal comparso of APCA wth all the competg techques. I secto 6 we dscuss several advatages APCA has over the competg techques, addto to beg faster. Secto 7 offers coclusos ad drectos for future work.

. Backgroud ad elated Work Gve two tme seres Q = {q,,q } ad C = {c,,c } ther Eucldea dstace s defed as: ( C ) ( ) Fgure shows the tuto behd the Eucldea dstace. D Q, q c () = C D(Q,C) Q Fgure : The tuto behd the Eucldea dstace. The Eucldea dstace ca be vsualzed as the square root of the sum of the squared legths of the gray les. There are essetally two ways the data mght be orgazed [6]: Whole Matchg.. Here t assumed that all sequeces to be compared are the same legth. Subsequece Matchg. Here we have a query sequece Q (of legth ), ad a loger sequece C (of legth m). The task s to fd the subsequece C of legth, begg at c, whch best matches Q, ad report ts offset wth C. Whole matchg requres comparg the query sequece to each caddate sequece by evaluatg the dstace fucto ad keepg track of the sequece wth the lowest dstace. Subsequece matchg requres that the query Q be placed at every possble offset wth the loger sequece C. Note t s possble to covert subsequece matchg to whole matchg by sldg a wdow of legth across C, ad makg copes of the (m-) wdows. Fgure 3 llustrates the dea. Although ths causes storage redudacy t smplfes the otato ad algorthms so we wll adopt ths polcy for the rest of ths paper. datapots There are two mportat kds of queres that we would lke to support tme seres database, rage queres (e.g., retur all sequeces wth a epslo of the query sequece) ad earest eghbor (e.g., retur the K closest sequeces to the query sequece). The brute force approach to aswerg these queres, sequetal scag, requres comparg every tme seres c to Q. Clearly ths approach s urealstc for large datasets. Ay dexg scheme that does ot exame the etre dataset could potetally suffer from two problems, false alarms ad false dsmssals. False alarms occur whe objects that appear to be close the dex are actually dstat. Because false alarms ca be removed a post-processg stage (by cofrmg dstace estmates o the orgal data), they ca be tolerated so log as they are relatvely frequet. A false dsmssal s whe qualfyg objects are mssed because they appear dstat dex space. We wll refer to smlarty-searchg techques that guaratee o false dsmssals as exact, ad techques that do ot have ths guaratee as approxmate. Approxmate techques ca stll be very useful for explorg large databases, partcularly f the probablty of false dsmssal s low. We wll revew approxmate techques secto. ad exact techques secto.. C C C Fgure 3: The subsequece matchg problem ca be coverted to the whole matchg problem by sldg a "wdow" of legth across the log sequece ad makg copes of the data fallg wth the wdows. 3

. Approxmate techques for smlarty searchg. Several researchers have suggested abadog the sstece o exact search favor of a much faster search that returs approxmately the same results. Typcally ths volves trasformg the data wth a lossy compresso scheme, the dog a sequetal search o the compressed data. Typcal examples clude [4, 7, 3, 46], who all utlze a pecewse lear approxmato. Others have suggested trasformg the data to a dscrete alphabet ad usg strg-matchg algorthms [,, 34, 9,, 38]. All these approaches suffer from some lmtatos. They are all evaluated o small datasets resdg ma memory, ad t s uclear f they ca be made to scale to large databases. Further, the systems are evaluated wthout cosderg precso ad recall, thus we ca say lttle or othg about the qualty of the retured aswer set. The work of [3, 36, 45, 5, 6] dffers from the above that they focus provdg a more flexble query laguage ad ot o performace ssues.. Exact techques for smlarty searchg. A tme seres C = {c,,c } wth datapots ca be cosdered as a pot -dmesoal space. Ths mmedately suggests that tme seres could be dexed by multdmesoal dex structure such as the -tree ad ts may varats [7]. Sce realstc queres typcally cota to, datapots (.e. vares from to ) ad most multdmesoal dex structures have poor performace at dmesoaltes greater tha 8- [6], we eed to frst perform dmesoalty reducto order to explot multdmesoal dex structures to dex tme seres data. I [6] the authors troduced GEerc Multmeda INdexIg method (GEMINI) whch ca explot ay dmesoalty reducto method to allow effcet dexg. The techque was orgally troduced for tme seres, but has bee successfully exted to may other types of data [8]. A mportat result [6] s that the authors proved that order to guaratee o false dsmssals, the dstace measure the dex space must satsfy the followg codto: D dex space (A,B) D true (A,B) Ths theorem s kow as the lower boudg lemma or the cotractve property. Gve the lower boudg lemma, ad the ready avalablty of off-the-shelf multdmesoal dex structures, GEMINI requres just the followg three steps. Establsh a dstace metrc from a doma expert ( ths case Eucldea dstace). Produce a dmesoalty reducto techque that reduces the dmesoalty of the data from to N, where N ca be effcetly hadled by your favorte dex structure. Produce a dstace measure defed o the N dmesoal represetato of the data, ad prove that t obeys D dex space (A,B) D true (A,B). Table cotas a outle of the GEMINI dexg algorthm. All sequeces the dataset C are trasformed by some dmesoalty reducto techque ad the dexed by the dex structure of choce. The dexg tree represets the trasformed sequeces as pots N dmesoal space. Each pot cotas a poter to the correspodg orgal sequece o dsk. We remove the mea (optoally) because, for may applcatos, we are oly terested the smlarty based o the shape of the sequece ad ot ts vertcal offset from the x-axs. If the offset s ot removed, t would domate the Eucldea dstace fucto leadg to ututuve otos of smlarty [9]. We remove the mea for the expermets ths paper. For the llustratve examples we use ths paper, we do ot remove the mea for smplcty. 4

Algorthm BuldIdex(C,); beg. for all objects database // C s the dataset, s the sze of the wdow. C C Mea(C ); // Optoal: remove the mea of C 3. C SomeTrasformato(C );// C s ay dmesoalty reduced represetato 4. Isert C to the Spatal Access Method wth a poter to C o dsk; 5. edfor ed Table : A outle of the GEMINI dexg buldg algorthm. Note that each sequece has ts mea subtracted before dexg. Ths has the effect of shftg the sequece the y-axs such that ts mea s zero, removg formato about ts offset. Ths step s cluded because for most applcatos the offset s rrelevat whe computg smlarty. Table below cotas a outle of the GEMINI rage query algorthm. Algorthm agequery(q,ε) beg. Project the query Q to the same feature space as the dex.. Fd all caddate objects the dex wth ε of the query. 3. etreve from dsk the actual sequeces poted to by the caddates. 4. Compute the actual dstaces, ad dscard false alarms. ed Table : The GEMINI rage query algorthm. The rage query algorthm s called as a subroute the K Nearest Neghbor algorthm outled Table 3. There are several optmzatos to ths basc K Nearest Neghbor algorthm that we utlze ths paper [4]. We wll dscuss them more detal Secto 4. Algorthm K_NearestNeghbor(Q,K) beg. Project the query Q to the same feature space as the dex.. Fd the K earest caddate objects the dex. 3. etreve from dsk the actual sequeces poted to by the caddates. 4. Compute the actual dstaces ad record the maxmum, call t εmax. 5. Issue the rage query, agequery(q,εmax); 6. Compute the actual dstaces, ad choose the earest K. ed Table 3: The GEMINI earest eghbor algorthm. The effcecy of the GEMINI query algorthms depeds oly o the qualty of the trasformato used to buld the dex. The tghter the boud o D dex space (A,B) D true (A,B) the better, as tghter bouds mply fewer false alarms hece lower query cost [7]. Tme seres are usually good caddates for dmesoalty reducto because they ted to cota hghly correlated features. For brevty, we wll ot descrbe the three ma dmesoalty reducto techques, SVD, DFT ad DWT, detal. Istead we refer the terested reader to the relevat papers or to [4] whch cotas a survey of all the techques. We wll brefly revst related work Secto 6 whe the reader has developed more tuto about our approach. 5

3. Adaptve esoluto epresetato I recet work Keogh et. al. [4] ad Y & Faloutsos [5] depedetly suggested approxmatg a tme seres by dvdg t to equal-legth segmets ad recordg the mea value of the datapots that fall wth the segmet. The authors use dfferet ames for ths represetato ([4] calls t Pecewse Aggregate Approxmato whle [5] calls t Segmeted-Meas ), we wll refer to t as Pecewse Aggregate Approxmato (PAA) ths paper. Ths smple techque s surprsgly compettve wth the more sophstcated trasforms. The fact that each segmet PAA s the same legth facltates dexg of ths represetato. Suppose however we relaxed ths requremet ad allowed the segmets to have arbtrary legths, does ths mprove the qualty of the approxmato? Before we cosder ths questo, we must remember that the approach that allows arbtrary legth segmets, whch we call Adaptve Pecewse Costat Approxmato (APCA), requres two umbers per segmet. The frst umber records the mea value of all the datapots segmet, the secod umber records the legth. So a far comparso s N PAA segmets to M APCA segmets, were M = N/. It s dffcult to make ay tutve guess about the relatve performace of the two techques. O oe had PAA has the advatage of havg twce as may approxmatg segmets. O the other had APCA has the advatage of beg able to place a sgle segmet a area of low actvty ad may segmets areas of hgh actvty. I addto oe has to cosder the structure of the data questo. It s possble to costruct artfcal datasets where oe approach has a arbtrarly large recostructo error, whle the other approach has recostructo error of zero. Fgure 4 llustrates a far comparso betwee the two techques o several real datasets. Note that for the task of dexg, subjectve feelgs about whch techque looks better are rrelevat. All that matters s the qualty of the approxmato, whch s gve by the recostructo error (because lower recostructo errors result tghter bouds o D dex space(a,b) D true (A,B).). (A) APCA E = 4.8 (B) APCA E = 98.5 (C) APCA E = 6.9 PAA E = 7. PAA E = 9 PAA E = 63.7 (D) APCA E = 57.3 (E) APCA E = 58. (F) APCA E = 5. PAA E = 7 PAA E = PAA E = 53.4 Fgure 4: A comparso of the recostructo errors of the equal-sze segmet approach (PAA) ad the varable legth segmet approach (APCA), o a collecto of mscellaeous datasets. A) INTEBALL Plasma processes. B) Darw sea level pressures. C) Space Shuttle telemetry. D) Electrocardogram. E) Maufacturg. F) Exchage rate. O fve of the sx tme seres APCA outperforms PAA sgfcatly. Oly o the Exchage ate data are they essetally the same. I fact, we repeated smlar expermets for more tha 4 dfferet tme seres datasets, over a rage of sequece legths ad compresso ratos ad we foud that APCA s always at least as good as PAA, ad usually much better. Ths comparso motvates our approach. If the APCA represetato ca be dexed, ts hgh fdelty to the orgal sgal should allow very effcet prug of the dex space (.e. few false alarms, hece 6

low query cost). We wll show how APCA ca be dexed the ext secto (Secto 4). I the rest of ths secto, we defe the APCA represetato formally, descrbe the algorthm to obta the APCA represetato of a tme seres ad dscuss the dstace measures for APCA. 3. The APCA represetato Gve a tme seres C = {c,,c }, we eed to be able to produce a APCA represetato, whch we wll represet as C ={<cv,cr >,,<cv M,cr M >}, cr = () cr cr 3 cr cv 3 cr 4 C cv cv C cv 4 Fgure 5: A tme seres C ad ts APCA represetato C, wth M = 4 where cv s the mea value of datapots the th segmet (.e. cv = mea( c cr c +,..., cr )) ad cr the rght edpot of the th segmet. We do ot represet the legth of the segmets but record the locatos of ther rght edpots stead for dexg reasos as wll be dscussed Secto 4. The legth of the th segmet ca be calculated as cr cr -. Fgure 5 llustrates ths otato. Symbols Deftos S Number of objects the database. Legth of tme seres (a.k.a. query legth, orgal dmesoalty) C = {c,,c } A tme seres a database, stored a vector of legth. Q = {q,,q } A query tme seres, represeted as a vector of legth. N Dmesoalty of dex structure, wth N <<. M Number of segmets APCA represetato, wth M = N/. C = A adaptve pecewse costat approxmato of C, wth c the mea value { <cv,cr >,,<cv M,cr M > } of th segmet ad cr the rght edpot of th segmet. Q = Also a adaptve pecewse costat approxmato, but obtaed usg a { <qv,qr >,,<qv M,qr M > } specal algorthm as descrbe Equato 4 D(Q,C) Eucldea dstace D AE (Q,C) A o-lower boudg approxmato of Eucldea dstace D LB (Q,C) or D LB (Q,C) A lower boudg approxmato of the Eucldea dstace cmax, cm The max ad m values of APCA represetato C th segmet =(L, H)= MB assocated wth a ode (say U) of the dex bult o N-dmesoal ({l,,l N }, {h,,h N }) APCA space; L={ l,,l N } ad H={h,,h N } deote the lower ad hgher edpots of the major dagoal of. C = ({cm, cr,, cm M, cr M }, { APCA rectagle correspodg to APCA pot C cmax, cr,, cmax M, cr M }) G = th rego assocated wth ; G [] ad G [3] are low ad hgh bouds {G [], G [], G [3], G [4]} alog the value axs; G [] ad G [4] are those alog the tme axs MINDIST(Q, ) Mmum dstace of MB from query tme seres Q MINDIST(Q,, t) Mmum dstace of MB from Q at tme stat t MINDIST(Q, G, t) Mmum dstace of rego G from Q at tme stat t Table 4: The otato used ths paper. 3. Obtag the APCA represetato As metoed before, the performace of the dex structure bult o the APCA represetato defed Equato depeds o how closely the APCA represetato approxmates the orgal sgal. Closer the approxmato, fewer the umber of false alarms, better the performace of the dex. We say that a M-segmet APCA represetato C of a tme seres C s optmal ( terms of the qualty of approxmato) ff C has the least recostructo error amog all possble M- segmet APCA represetatos of C. Fdg the optmal pecewse polyomal represetato of 7

a tme seres requres a O(M ) dyamc programmg algorthm [5, 35]. Ths s too slow for hgh dmesoal data. I ths paper, we propose a ew algorthm to produce almost optmal APCA represetatos O(log()) tme. The algorthm works by frst covertg the problem to a wavelet compresso problem, for whch there are well kow optmal solutos, the covertg the soluto back to the ACPA represetato ad (possbly) makg mor modfcatos. The algorthm leverages off the fact that the Haar wavelet trasformato of a tme seres sgal ca be calculated O(), ad that a optmal recostructo (.e., havg least recostructo error) of the sgal for ay level of compresso (.e., #retaed coeffcets/) ca be obtaed by sortg the coeffcets order of decreasg ormalzed magtude, the trucatg off the smaller coeffcets [44]. The segmets the recostructed sgal may have approxmate mea values (due to trucato); we replace them by the exact mea values to get a vald APCA represetato as defed Equato. There are, however, two ssues we must address before utlzg ths approach. ) The DWT s defed oly for tme seres wth a legth that s a teger power of two whle may ot ecessarly be a power of two. Ths problem ca be solved easly by paddg those tme seres wth zeros, the trucatg the correspodg segmet after performg the DWT. ) There s o exact mappg betwee the umber of Haar coeffcets retaed ad the umber of segmets the APCA represetato resultg from the recostructo. For example a sgle coeffcet Haar approxmato could produce a, or 3-segmet APCA represetato. Our soluto s to keep the largest M coeffcets; ths wll produce a APCA represetato wth the umber of segmets betwee M ad 3M. If the umber of segmets s more tha M, adjacet pars of segmets are merged utl exactly M segmets rema. The segmet pars targeted for mergg are those that ca be fused to a sgle segmet wth the mmum crease recostructo error. Table 5 cotas the outle of the algorthm. Algorthm Compute_APCA(C,M) beg. f legth(c) s ot a power of two, pad t wth zeros to make t so.. Perform the Haar Dscrete Wavelet Trasform o C. 3. Sort coeffcets order of decreasg ormalzed magtude, trucate after M. 4. ecostruct approxmato (APCA represetato) of C from retaed coeffs. 5. If C was padded wth zeros, trucate t to the orgal legth. 6. eplace approxmate segmet mea values wth exact mea values. 7. whle the umber of segmets s greater tha M 8. Merge the par of segmets that ca be merged wth least rse error 9. edwhle ed Table 5: A algorthm to produce the APCA. The parameter M should be chose judcously. If M s too large, the dmesoalty N of dex structure (N=M) wll be hgh resultg hgh query cost (due to dmesoalty curse). If M s too small, the recostructo error may become large leadg to too may false postves ad hece hgh query cost. So, M should be chose such that the overall recostructo error remas low wthout lettg the dmesoalty exceed the crtcal threshold of the dex structure (above whch t performs worse tha sequetal sca). The actual choce of M would deped o the dataset ad the multdmesoal dex structure used. 8

We llustrate the workg of the above algorthm usg a umercal example. Example (Computg APCA represetato) Let us cosder a tme seres C=[7, 5, 5, 3, 3, 3, 4, 6]. Table 6 shows the Haar wavelet decomposto of the seres. We start by parwse averagg the values to get a ew lowerresoluto represetato of the data wth the followg average values [(7+5)/, (5+3)/, (3+3)/, (4+6)/] = [6, 4, 3, 5]. Obvously, some formato s lost ths averagg process. To be able to recostruct the orgal seres, we eed to also store the dffereces of the (secod of the) averaged values from the computed parwse average,.e., [6-5, 4-3, 3-3, 5-6] = [,,, -]. These are called the detal coeffcets. Applyg the parwse averagg ad dfferecg process recursvely o the lower-resoluto array cotag the averages, we get the full decomposto show Table 6. esoluto Averages Detal Coeffcets 3 [7, 5, 5, 3, 3, 3, 4, 6] - [6, 4, 3, 5] [,,, -] [5, 4] [, -] [4.5] [.5] Table 6: Haar Wavelet Trasform for APCA Computato The wavelet trasform W C of C cossts of the sgle coeffcet represetg the overall average of data values followed by the detaled coeffcets the order of creasg resoluto. So W C = [4.5,.5,, -,,,, -]. To take to accout the mportace of a coeffcet wth regard to the recostructo of the orgal seres (.e., the umber of elemets the seres t cotrbutes to the recostructo of), we ormalze the trasform by dvdg each coeffcet by (l/) where l s the level of resoluto of the coeffcet. So the ormalzed wavelet trasform of C s W = [ 4. 5,. 5,,,,,, ]. 7 C 5 5 6 Segm et 5.5 Segm et 4 5.5 3 3 3 4 Segm et 3.5 Segm et 3 3.5 (a ) (b ) Segm et cv = 6 Segm et cv = 4 Segm et 3 cv 3 = 3 Segm et 4 cv 4 = 5 Segm et cv = 6 Segm et cv = 3.5 Segm et 3 cv 3 = 5 (c ) (d ) Fgure 6: Step-by-step workg of Compute_APCA algorthm. (a) Orgal tme seres C = [7,5,5,3,3,3,4,6] (b) Tme seres recostructed from the M (3 ths case) best wavelet coeffcets of C. The recostructed seres has 4 segmets (segmet boudares dcated by dots). The mea value of each segmet s show just above the segmet. (c) ecostructed tme seres wth approxmate meas replaced by exact meas (cv s). (d) Fal APCA represetato obtaed by mergg segmets ad 3 (to reduce the umber of segmets to M=3). 9

Suppose M=3. So we would reta the 3 coeffcets wth hghest ormalzed magtude,.e., the frst, thrd ad fourth coeffcets. Fgure 6(a) ad (b) show the orgal tme seres C ad tme seres recostructed from those 3 coeffcets respectvely. Fgure 6(c) shows the recostructed tme seres wth approxmate segmet mea values replaced by the exact oes. Fally, we eed to merge oe par of segmets to reduce the umber of segmets to M=3; segmet ad 3 s the best par to merge as t results the mmum crease recostructo error. Fgure 6(d) shows the fal 3-segmet APCA represetato of C produced by the Compute_APCA algorthm. We expermetally compared ths algorthm wth several of the heurstc, mergg algorthms [5, 35, 4] ad foud t s faster (at least 5 tmes faster for ay legth tme seres) ad slghtly superor terms of recostructo error. 3.3 Dstace measures defed for the APCA represetato Suppose we have a tme seres C, whch we covert to the APCA represetato C, ad a query tme seres Q. Clearly, o dstace measure defed betwee Q ad C ca be exactly equvalet to the Eucldea dstace D(Q,C) (defed Equato ) because C geerally cotas less formato tha C. However, we wll defe two dstace measures betwee Q ad C that approxmate D(Q,C). The frst, D AE (Q,C) s desged to be a very tght approxmato of the Eucldea dstace, but may ot always lower boud the Eucldea dstace D(Q,C). The secod, D LB (Q,C) s geerally a less tght approxmato of the Eucldea dstace, but s guarateed to lower-boud, a property ecessary to utlze the GEMINI framework. These dstace measures are defed below, Fgure 7 llustrates the tuto behd the formulas. 3.3. A approxmate Eucldea measure D AE Gve a query Q, raw data format, ad a tme seres C the APCA represetato, D AE (Q,C) s defed as: M cr cr cv q (3) D AE (Q,C) ( k cr ) = k= + Ths measure ca be effcetly calculated O(), ad t tghtly approxmates the Eucldea dstace, ufortuately t has a drawback whch prevets ts use for exact search. Proposto D AE (Q,C) does ot satsfy the tragular equalty Proof: By couter example. The tragular equalty states that for ay objects α, β ad χ d(α,β) d(α,χ) + d(β,χ) I other words, f D AE (Q,C) obeys tragle equalty, there ca exst o object A, B ad C such that D AE (A,B) D AE (A,C) + D AE (B,C) We prove our proposto by fdg three such objects. Cosder the tme seres A = {-, -, -,, }, B = {,,, -, -} ad C = {,,,, -}. The - segmet APCA represetatos of A, B ad C as produced by the Compute_APCA algorthm are A = {<-,>, < / 3,5>}, B = {< / 3,3>,<-,5>} ad C = {< /,>,< - / 3,5>} respectvely. Accordg to Equato 3, D AE (A,B) (/3- (-)) + (/3- (-)) + (/3- (-)) + (--) + (-- ).7777 +.7777 + 7.+ 4 + 9 5.66 Smlarly, D AE (A,C) = 3.879 ad D AE (B,C) =.47. So, D AE (A,C) + D AE (B,C) = 5.36. D AE (A,B) D AE (A,C) + D AE (B,C) Ths mples D AE (Q,C) does ot satsfy the tragular equalty.

The falure of D AE to obey the tragular equalty meas that t may ot lower boud the Eucldea dstace ad thus caot be used for exact dexg [5]. However, we wll demostrate later that t s very useful for approxmate search. 3.3. A lower-boudg measure D LB To defe D LB (Q,C) we must frst troduce a specal verso of the APCA. Normally the algorthm metoed Secto 3. s used to obta ths represetato. However we ca also obta ths represetato by projectg the edpots of C oto Q, ad fdg the mea value of the sectos of Q that fall wth the projected tervals. A tme seres Q coverted to the APCA represetato ths way s deoted as Q. The dea ca be vsualzed Fgure 7 III. Q s defed as: Q ={<qv,qr >,,<qv M,qr M >}, where qr = cr ad qv = mea( q cr q +,..., cr ) (4) M D LB (Q,C) s defed as: D LB (Q,C) = ( cr cr )( qv cv (5) ) I Q III C C Q Q II IIII D AE ( Q,C ) D LB (Q,C ) Fgure 7: A vsualzato of the two dstace measures defe o the APCA represetato. I) A query tme seres Q ad a APCA object C. II) The D AE measure ca be vsualzed as the Eucldea dstace betwee Q ad the recostructo of C. III) Q s obtaed by projectg the edpots of C oto Q ad calculatg the mea values of the sectos fallg wth the projected les. IIII) The D LB measure ca be vsualzed as the square root of the sum of the product of squared legth of the gray les wth the legth of the segmets they jo. We llustrate the computato of D LB (Q,C) usg a umercal example below. Example (Computato of D LB (Q,C)) Let us cosder a tme seres A={4, 6,,, }. The -segmet APCA represetato of A as produced by the Compute_APCA algorthm s A={<5,>, <,5>}. Let Q = {5, 3, 5, 6, 7} be a query tme seres. To compute D LB (Q,C), we frst compute Q ={<4,>, <6,5>}. D LB (Q,C) = ( )(4 5) + (5 )(6 ) = 8.775. Note that D LB (Q,C) lower bouds D(Q,C) = = ( 5 4) + (3 6) + (5 ) + (6 ) + (7 ) = 9.37. The formal proof s show below. Lemma : D LB (Q,C) lower bouds the Eucldea Dstace D(Q,C). Proof: We preset a proof for the case where there s a sgle segmet the APCA represetato. The more geeral proof for the M segmet case ca be obtaed by applyg the proof to each of the M segmets. Let W={w, w,, w p } be a vector of p real umbers. Let W deote the arthmetc mea of W,.e., W = (Σ w )/p. We defe a vector W of real umbers where w W w. It s easy to see that Σ w =. The defto of w allows us to substtute w by W w, a fact whch we wll utlze the proof below. Let Q ad C be the query ad data tme seres respectvely, wth Q = C =. Let Q ad C be the correspodg APCA vectors as defed Equatos 4 ad respectvely.

We wat to prove M ( q ) = = c ( cr cr qv cv )( ) Because we are cosderg just the sgle segmet case, we ca remove summato over M segmets ad rewrte the equalty as: Assume Because (cr cr - ) = = ( q c ) ( cr cr )( qv cv ) ( ) ( ) = q c qv cv ( ) ( ) Because the terms uder the radcals must be oegatve, we ca square both sdes qv s smply the mea of Q, so rewrte as Q cv s smply the mea of C, so rewrte as C Substtute rearragemet of defto above = = q c qv cv ( q c ) ( Q C ) (( Q q ) ( C c )) ( Q C ) = ( ) ( ) earrage terms ( Q C) ( q c ) Q C = ( ) Bomal theorem Dstrbutve law Summato propertes Assocatve law Σ w =, proved above ( Q C) ( Q C)( q c ) ( q c ) = + Q C ( Q C) ( Q C)( q c ) ( q c ) + = = = ( Q C ) ( Q ) ( Q C) ( Q C) ( q c ) ( q c ) C + = = ( q c ) ( q c ) ( Q ) + = ( Q C) ( Q C) C = = C) ( Q C) ( ) + ( q c ) ( Q ) = ( ) ( Q C Subtract lke term from both sdes ( Q C) + ( q c ) Q C = = The sum of squares must be oegatve, so our assumpto was true. Hece the proof. ( q c ) 3.4 Qualty of proposed dstace measures D LB ad D AE The qualty of a lower boudg dstace measure ca be gauged by how tghtly t lower bouds the true dstace betwee all queres of terest ad all objects the database (because all queres of terest caot be kow advace ad the database may be very large, a large radom samplg must suffce). For a o-lower boudg measure, the qualty s a lttle more dffcult to defe. Itutvely we wat the measure to tghtly approxmate the true dstace but oly rarely overestmate t. I addto, whe the true dstace s overestmated, t should ot be by a large amout.

We devsed a smple expermet to llustrate the qualty of D LB ad D AE compared to the DWT (Haar) ad the DFT approaches. We radomly extracted two sequeces A ad B from a database ad measured the true Eucldea dstace D(A,B) betwee them. We also measured the dstace betwee A ad B usg the varous reduced dmesoalty represetatos for a fxed value of N. The rato of the estmated dstace over the true dstace for all combatos was used to plot a pot -space, as llustrated Fgure 8. Sample of Electrocardogram data Sample of Star Lght Curve data ato of estmated dstace over true Eucldea dstace for competg approaches (Haar wavelet ad DFT) ego whch APCA has less prug power ego whch APCA has more prug power ego whch APCA has less prug power ego whch APCA has more prug power ego whch APCA volates the lower boudg lemma ato of estmated dstace over true Eucldea dstace for D LB ato of estmated dstace over true Eucldea dstace for D AE Haar vs APCA DFT vs APCA Electrocardogram Star Lght Curve Fgure 8: A vsualzato of the prug power of the two dstace measures defed o APCA as compared to the prug power of the Haar wavelet ad DFT approaches. Pots o the dagoal dcate that the two approaches beg compared have the same prug power, pots below the dagoal dcate that APCA s superor ad pots above the dagoal dcate that APCAs rval s superor. We repeated ths, tmes wth radomly chose sequeces for each of two datasets. DWT ad DFT behaved smlarly, so for brevty we wll oly dscuss the comparsos betwee DWT ad the two measures defed o APCA. For the Electrocardogram dataset D LB produces tghter lower bouds tha the Haar wavelet approach 99.9% of the tme, ad the dfferece s usually qute sgfcat. The D AE measure very tghtly approxmated the true dstace, ad oly volated lower boudg.9% of the tme, geerally by a very small amout. For the Star Lght Curve dataset D LB produces tghter lower bouds tha the Haar wavelet approach 8.3% of the tme. The D AE measure oly volates lower boudg.% of the tme ad geerally s a extremely tght approxmato of the true dstace. The qualty of these results strogly suggests that APCA would be superor to exstg approaches f dexable. We wll address ths ssue the ext secto. 3

4. Idexg the APCA represetato The APCA represetato proposed Secto 3. defes a N-dmesoal feature space (N = M). I other words, the proposed represetato maps each tme seres C = {c,,c } to a pot C = {cv, cr,, cv M, cr M } a N-dmesoal space. We refer to the N-dmesoal space as the APCA space ad the pots the APCA space as APCA pots. I ths secto, we dscuss how we ca dex the APCA pots usg a multdmesoal dex structure (e.g., -tree) ad use the dex to aswer rage ad K earest eghbors (K-NN) queres effcetly. We wll cocetrate o K-NN queres ths secto; rage queres wll be dscussed brefly at the ed of the secto. Algorthm ExactKNNSearch(Q,K) Varable queue: MPrortyQueue; Varable lst: temp; beg. queue.push(root_ode_of_dex, );. whle ot queue.isempty() do 3. top = queue.top(); 4. for each tme seres C temp such that D(Q,C) top.dst 5. emove C from temp; 6. 7. Add C to result; f result = K retur result; 8. edfor 9.. queue.pop(); f top s a APCA pot C. etreve full tme seres C from database;. 3. temp.sert(c, D(Q,C)); else f top s a leaf ode 4. for each data tem C top 5. 6. queue.push(c, D LB (Q,C)); edfor 7. else // top s a o-leaf ode 8. for each chld ode U top 9. queue.push(u, MINDIST(Q,)) // s MB assocated wth U. edfor. edf. eddo ed Table 6: K-NN algorthm to compute the exact K earest eghbors of a query tme seres Q usg a multdmesoal dex structure A K-NN query (Q, K) wth query tme seres Q ad desred umber of eghbors K retreves a set C of K tme seres such that for ay two tme seres C C, E C, D(Q, C) D(Q, E). The algorthm for aswerg K-NN queres usg a multdmesoal dex structure s show Table 6 3. The above algorthm s a optmzato o the GEMINI K-NN algorthm descrbed Table 3 ad was proposed [4]. Lke the basc K-NN algorthm [9,4], the algorthm uses a prorty queue queue to avgate odes/objects the dex the creasg order of ther dstaces from Q the dexed (.e. APCA) space. The dstace of a object (.e. APCA pot) C from Q s defed by D LB (Q,C) (cf. Secto 3.3.) whle the dstace of a ode U from Q s defed by the mmum dstace MINDIST(Q,) of the mmum boudg rectagle (MB) assocated wth U from Q (defto of MINDIST wll be dscussed later). Itally, we push the root ode of the dex to the queue (Le ). Subsequetly, the algorthm avgates the dex by 3 I ths paper, we restrct our dscusso to oly feature-based dex structures.e. multdmesoal dex structures that recursvely cluster pots usg mmum boudg rectagles (MBs). Examples of such dex structures are - tree, X-tree ad Hybrd Tree. Note that the MB-based clusterg ca be logcal.e. the dex structure eed ot store the MBs physcally as log as they ca be derved from the physcally stored formato. For example, space parttog dex structures lke the hb-tree ad the Hybrd Tree store the parttog formato sde the dex odes as kd-trees [4, 6]. Sce the MBs ca be derved from the kd-trees, the techques dscussed here are applcable to such dex structures [6]. 4

poppg out the tem from the top of queue at each step (Le 9). If the popped tem s a APCA pot C, we retreve the orgal tme seres C from the database, compute ts exact dstace D(Q,C) from the query ad sert t to a temporary lst temp (Les -). If the popped tem s a ode of the dex structure, we compute the dstace of each of ts chldre from Q ad push them to queue (Les 3-). We move a tme seres C from temp to result oly whe we are sure that t s amog the K earest eghbors of Q.e. there exsts o object E result such that D(Q,E) < D(Q,C) ad result < K. The secod codto s esured by the ext codto Le 7. The frst codto ca be guarateed as follows. Let I be the set of APCA pots retreved so far usg the dex (.e. I = temp result). If we ca guaratee that C I, E I, D LB (Q,C) D(Q,E), the the codto D(Q,C) top.dst Le 4 would esure that there exsts o uexplored tme seres E such that D(Q, E) < D(Q,C). By sertg the tme seres temp (.e. already explored objects) to result creasg order of ther dstaces D(Q,C) (by keepg temp sorted by D(Q,C)), we ca esure that there exsts o explored object E such that D(Q, E) < D(Q,C). I other words, f C I, E I, D LB (Q,C) D(Q,E), the above algorthm would retur the correct aswer. Before we ca use the above algorthm, we eed to descrbe how to compute MINDIST(Q,) such that the correctess requremet s satsfed.e. C I, E I, D LB (Q,C) D(Q,E). We ow dscuss how the MBs are computed ad how to compute MINDIST(Q,) based o the MBs. We start by revstg the tradtoal defto of a MB [7]. Let us assume we have bult a dex of the APCA pots by smply sertg the APCA pots C = {cv, cr,, cv M, cr M } to a MB-based multdmesoal dex structure (usg the sert fucto of the dex structure). Let U be a leaf ode of the above dex. Let = (L, H) be the MB assocated wth U where L = {l, l,, l N } ad H = {h, h,, h N } are the lower ad hgher edpots of the major dagoal of. By defto, s the smallest rectagle that spatally cotas each APCA pot C = {cv, cr,, cv M, cr M } stored U. Formally, = (L, H) s defed as: Defto 4. (Old defto of MB) l = m C U cv( + ) / f s odd (6) L = m cr f s eve C U / h f s odd L = max C U cv( + ) / L = max cr f s eve C U The MB assocated wth a o-leaf ode would be the smallest rectagle that spatally cotas the MBs assocated wth ts mmedate chldre [7]. / cmax3 cmax cmax cm3 cm C cm C Fgure 9: Defto of cm ad cmax for computg MBs cmax4 cm4 However, f we buld the dex as above (.e. the MBs are computed as Defto 4.), t s ot possble to defe a MINDIST(Q,) that satsfes the correctess crtera. To overcome the problem, we defe the MBs are follows. Let us cosder the MB of a leaf ode U. For ay APCA pot C = {cv, cr,, cv M cr M } stored ode U, let cm ad cmax deote the mmum ad maxmum values of the correspodg tme seres C amog the datapots the th segmet.e. 5

cr cm = m ( c ) t = cr t + ad (7) cr cmax = max ( c ) t = cr t + The cm ad cmax of a smple tme seres wth 4 segmets s show Fgure 9. We defe the MB = (L, H) assocated wth U as follows: Defto 4. (New defto of MB) l = m C U cm( + ) / f s odd (8) = m cr f s eve C U / h f s odd = max C U cmax( + ) / = max cr f s eve C U As before, the MB assocated wth a o-leaf ode s defed as the smallest rectagle that spatally cotas the MBs assocated wth ts mmedate chldre. How do we buld the dex such that the MBs satsfy Defto 4.. We sert rectagles stead of the APCA pots. I order to sert a APCA pot C = {cv,cr,..,cv M,cr M }, we sert a rectagle C = ({cm, cr,, cm M, cr M },{ cmax, cr,, cmax M, cr M }) (.e. {cm, cr,, cm M, cr M } ad { cmax, cr,, cmax M, cr M }) are the lower ad hgher edpots of the major dagoal of C ) to the multdmesoal dex structure (usg the sert fucto of the dex structure). Sce the serto algorthm esures that the MB of a leaf ode U spatally cotas all the C s stored U, satsfes defto 4. (as llustrated example 3 below). The same s true for MBs assocated wth o-leaf odes. Example 3 (Computato of MBs) Let us cosder two tme seres A={4, 6,,, } ad B={4, 3, 5,, 3}. The -segmet APCA represetatos of A ad B as produced by the Compute_APCA algorthm are A= {<av, ar >, <av, ar >} = {<5,>, <,5>} ad B = {<bv, br >, <bv, br >} ={<4,3>, <,5>} respectvely. For A, am = m(4,6) = 4, amax = max(4,6) = 6, am = m(,,) =, amax = max(,,) =. For B, bm = m(4,3,5) = 3, bmax = max(4,3,5) = 5, bm = m(,3) =, bmax = max(,3) = 3. So, APCA rectagles A = ({am, ar, am, ar }, {amax, ar, amax, ar }) = ({4,,, 5}, {6,,, 5}) ad B = ({bm, br, bm, br }, {bmax, br, bmax, br }) = ({3, 3,, 5}, {5, 3, 3, 5}). Sce the MB of A ad B s the smallest rectagle that spatally cotas A ad B, = ({m(am, bm ), m(ar, br ), m(am, bm ), m(ar, br )}, {max(amax, bmax ), max(ar, br ), max(amax, bmax ), max(ar, br )}) whch satsfes defto 4.. To complete the example, = ({m(4,3), m(,3), m(,), m(5,5)},{max(6,5), max(,3), max(,3), max(5,5)}) = ({3,,, 5}, {6, 3, 3, 5}). Sce we use oe of the exstg multdmesoal dex structures to buld the APCA dex, the storage orgazato of the odes follows that of the dex structure (e.g., MB, chld_ptr array f -tree s used, kd-tree f hybrd tree s used). For the leaf odes, we eed to store the cv s of each data pot ( addto to the cmax s, cm s ad cr s) sce they are eeded to compute D LB (Le 5 of the K-NN algorthm Table 6). The dex ca be optmzed ( terms of leaf ode faout) by ot storg the cmax s ad cm s of the data pots at the leaf odes.e. just storg the cv s ad cr s (a total of M umbers) per data pot addto to the tuple detfer. The reaso s that the cmax s ad cm s are ot requred for computg D LB, ad hece are ot used by the K-NN algorthm. They are eeded just to compute the MBs properly (accordg to defto 4.) at the tme of serto. The oly tme they are eeded later (after the tme of serto) s durg the recomputato of the MB of the leaf ode cotag the data pot after a ode splt. The sert fucto of the dex structure ca be easly modfed to fetch the cmax s / 6

ad cm s of the ecessary data pots from the database (usg the tuple detfers) o such occasos. The small extra cost of such fetches durg ode splts s worth the mprovemet search performace due to hgher leaf ode faout. We have appled ths optmzato the dex structure for our expermets but we beleve the APCA dex would work well eve wthout ths optmzato. Oce we have bult the dex as above (.e. the MBs satsfy Defto 4.), we defe the mmum dstace MINDIST(Q,) of the MB assocated wth a ode U of the dex structure from the query tme seres Q. For correctess, C I, E I, D LB (Q,C) D(Q,E) (where I deotes the set of APCA pots retreved usg the dex at ay stage of the algorthm). We show that the above correctess crtera s satsfed f MINDIST(Q,) lower bouds the Eucldea dstace D(Q,C) of Q from ay tme seres C placed uder U the dex. Lemma : If MINDIST(Q,) D(Q,C) for ay tme seres C placed uder U, the algorthm Table 6 s correct.e. C I, E I, D LB (Q,C) D(Q,E) where I deotes the set of APCA pots retreved usg the dex at ay stage of the algorthm. Proof: Accordg to the K-NN algorthm, ay tem E I must satsfy oe of the followg codtos: ) E has bee serted to the queue but has ot bee popped yet.e. C I, D LB (Q, C) D LB (Q,E) ) E has ot yet bee serted to the queue.e. there exsts a paret ode U of E whose MB satsfes the followg codto: C I, D LB (Q,C) MINDIST(Q,). Sce D LB (Q,E) D(Q,E) (Lemma ), () mples C I, D LB (Q,C) D(Q,E). If MINDIST(Q,) D(Q,E) for ay tme seres E uder U, () mples that C I, D LB (Q, C) D(Q,E). Sce ether () or () must be true for ay tem E I, C I, E I, D LB (Q,C) D(Q,E). A trval defto MINDIST(Q,) that lower bouds D(Q,C) for ay tme seres C uder U s MINDIST(Q,) = for all Q ad. However, ths defto s too coservatve ad would cause the K-NN algorthm to vst all odes of the dex structure before returg ay aswer (thus defeatg the purpose of dexg). The larger the MINDIST, the more the umber of odes the K-NN algorthm ca prue, the better the performace. We provde such a defto of MINDIST below 3. Let us cosder a ode U wth MB = (L,H). We ca vew the MB as two APCA represetatos L={<l, l >,, <l N-, l N >} ad H = {<h, h >,, < h N-, h N >}. The vew of a 6- dmesoal MB ({l,l,..,l 6 }, {h,h,,h 6 }) as two APCA represetatos {<l, l >, < l 5, l 6 >} ad {<h, h >, <h 5, h 6 >} s show Fgure. Ay tme seres C = {c, c,, c } uder the ode U s cotaed wth the two boudg tme seres L ad H (as show Fgure ). I order to formalze ths oto of cotamet, we defe a set of M regos assocated wth. The th rego G ( =,, M) assocated wth s defed as the -dmesoal rectagular rego the value-tme space that fully cotas the th segmet of all tme seres stored uder U. The boudary of a rego G, beg Note that MINDIST (Q,) does ot have to lower boud D LB (Q,C) for ay C uder U; t just has to lower boud D(Q,C) for ay C uder U. 3 Idex structures ca allow exteral applcatos to plug doma-specfc MINDIST fuctos ad pot-to-pot dstace fuctos ad retreve earest eghbors based o those fuctos (e.g., Cosstet fucto GST). 7

EGION G = {l 3, l +, h 3, h 4 } t t H= { <h, h >, <h 3, h 4 >, <h 5, h 6 >} value axs h l 3 h 3 l h 5 EGION 3 G 3 = {l 5, l 4 +, h 5, h 6 } l 5 Ay tme seres C={c,..., c } uder ths ode wth MB=(L,H) EGION l G = {l,, h, h } l 4 h h l s cotaed betwee L ad H 4 6 (the dots o the tme seres mark h 6 the starts ad eds of the 3 tme axs L= { <l, l >, <l 3, l 4 >, <l 5, l 6 >} APCA segmets Fgure : The M egos assocated wth a M-dmesoal MB. The boudary of a rego G s deoted by G = {G[], G[], G[3], G[4]} a -d rectagle, s defed by 4 umbers: the low bouds G[] ad G[] ad the hgh bouds G[3] ad G[4] alog the value ad tme axes respectvely. By defto, G ] = m ( cm ) (9) [ C uder U G [ ] m ( cr + ) = C uder U G [ 3] = maxc uder U ( cmax ) G 4] = max ( cr ) [ C uder U Based the defto of MB Defto 4., G ca be defed terms of the MB as follows: Defto 4.3 (Defto of regos assocated wth MB) G [] = l {-} () G [] = l {-} + G [3] = h {-} G [4] = h {} Fgure shows the 3 regos assocated wth the 6-dmesoal MB ({l,l,..,l 6 }, {h,h,,h 6 }). We llustrate the rego computato usg a umerc example Example 4. Example 4 (ego Computato) Let us cosder the MB Example 3. ecall =({l, l, l 3, l 4 }, {h, h, h 3, h 4 }) = ({3,,, 5),{6, 3, 3, 5}). The two regos assocated wth are: G = {m(am, bm ), m(ar +, br +), max(amax, bmax ), max(ar, br )} (by Equato 9) = {l, l +, h, h } (by Equato ) = {3,, 6, 3} G = {m(am, bm ), m(ar +, br +), max(amax, bmax ), max(ar, br )} ={l 3, l +, h 3, h 4 } = {,, 3, 5} 8

At tme stace t (t =,,), we say a rego G s actve ff G [] t G [4]. For example, Fgure, oly regos ad are actve at tme stat t whle regos, ad 3 are actve at tme stat t. The value c t of a tme seres C uder U at tme stat t must le wth oe of the regos actve at t.e. G [] c G s actve t G [3]. Lemma 3:The value c t of a tme seres C uder U at tme stat t must le wth oe of the regos actve at t. Proof: Let us cosder a rego < t. Frst, let us cosder the case uder U. Sce G that s ot actve at tme stat t.e. ether G [] > t. By defto, G [] > t or G [4] G [] cr - + for ay C G [] > t, t < cr - +.e. c t s ot segmet. Now let us cosder the case G [4] < t. By defto, G [4] cr for ay C uder U. Sce G [4] < t, t > cr.e. c t s ot segmet. Hece, f rego G s ot actve at t, c t caot le segmet.e. c t ca le segmet oly f G s actve. By defto of regos, c t must le wth oe of the regos actve at t.e. G s actve G [] c t G [3]. EGION G = {l 3, l +, h 3, h 4 } value axs Q uery tm e seres Q = {q,..., q } EGION 3 G 3 = {l 5, l 4 +, h 5, h 6 } EGION G = {l,, h, h } tm e axs MINDIST(Q,,t) =m (M IN D IST(Q,G,t), M IN D IST (Q,G,t)) =m ((q t - h), (q t - h3) ) =(q t - h3) t Fgure : Computato of MINDIST Gve a query tme seres Q = {q, q,, q }, the mmum dstace MINDIST(Q,,t) of Q from at tme stat t (cf. Fgure ) s gve by m MINDIST(Q,G,t) where rego G s actve at t MINDIST(Q,G,t) = (G[]-q t ) f q t < G[] () = (q t -G[3]) f G[3] < q t = otherwse. MINDIST(Q,) s defed as follows: MINDIST(Q,) = t = We llustrate the MINDIST computato usg a umerc example Example 5. Example 5 (MINDIST Computato) Let us cosder the MB Examples 3 ad 4 ad ts assocated regos cosder the query tme seres Q = {5, 3, 5, 6, 7} Example. MINDIST(Q,,) = MINDIST(Q, G, ) = t MINDIST(Q,,t) =m (M IN D IST(Q,G,t), M IN D IS T (Q,G,t), M IN D IST (Q,G 3,t )) =m ((q t - h),, (q t - h3) ) = MINDIST( Q,, t) () G ad G. Let us 9

MINDIST(Q,,) = m(mindist(q, MINDIST(Q,,3) = m(mindist(q, MINDIST(Q,,4) = MINDIST(Q, MINDIST(Q,,5) = MINDIST(Q, G, ), MINDIST(Q, G, 3), MINDIST(Q, G, 4) = (6-3) = 9 G, 5) = (7-3) = 6 G, )) = G, 3)) = MINDIST(Q,) = 9 + 6 = 5. Note that MINDIST(Q,) lower bouds D(Q,A) = ( 5 4) + (3 6) + (5 ) + (6 ) + (7 ) = 9.37 ad D(Q,B)= = ( 5 4) + (3 3) + (5 5) + (6 ) + (7 3) = 6.48 (formal proof below) Lemma4: MINDIST(Q,) lower bouds D(Q,C) for ay tme seres C uder U. Proof: We wll frst show MINDIST(Q,,t) lower bouds D(Q,C,t) = (q t -c t ) for ay tme seres C uder U. We kow that c t must le oe of the actve regos (Lemma 3). Wthout loss of geeralty, let us assume that c t les a actve rego G.e. G[] c t G[3]. Hece MINDIST(Q,G,t) D(Q,C,t). Also, MINDIST(Q,,t) <= MINDIST(Q,G,t) (by defto of MINDIST(Q,,t)). Hece MINDIST(Q,,t) lower bouds D(Q,C,t). Sce MINDIST(Q,) = t = MINDIST ( Q,, t) ad D(Q,C) = MINDIST( Q, C, t), t = MINDIST(Q,,t) D(Q,C,t) mples MINDIST(Q,) D(Q,C). Note that, geeral, lower the umber of actve regos at ay stat of tme, hgher the MINDIST, better the performace of the K-NN algorthm. Also, arrower the regos alog the value dmeso, hgher the MINDIST. The above two prcples justfy our choce of the dmesos of the APCA space. The odd dmesos help clusterg APCA pots wth smlar cv s, thus keepg the regos arrow alog the value dmeso. The eve dmesos help clusterg APCA pots that are approxmately alged at the segmet ed pots, thus esurg oly oe rego (mmum possble) s actve for most stats of tme. Algorthm ExactageSearch(Q, ε, T) beg. f T s a o-leaf ode. for each chld U of T 3. f MINDIST(Q,) ε ExactageSearch(Q, ε, U); // s MB of U 4. edfor 5. else // T s a leaf ode 6. for each APCA pot C T 7. f D LB (Q,C) ε 8. etreve full tme seres C from database; 9. f D(Q,C) ε Add C to result;. edf. edfor. edf ed Table 7:age search algorthm to retreve all the tme seres wth a rage of ε from query tme seres Q. The fucto s voked as ExactageSearch(Q, ε, root_ode_of_dex). Although we have focussed o K-NN search ths secto, the deftos of D LB ad MINDIST proposed ths paper are also eeded for aswerg rage queres usg a multdmesoal dex structure. The rage search algorthm s show Table 7. It s a straghtforward -tree-style recursve search algorthm combed wth the GEMINI rage query

algorthm show Table. Sce both MINDIST(Q,) ad D LB (Q,C) lower boud D(Q,C), the above algorthm s correct [6]. I ths secto, we descrbed how to fd the exact earest eghbors of a query tme seres usg a multdmesoal dex structure. I Secto 3.3., we proposed a approxmate Eucldea dstace measure D AE (Q,C) for fast approxmate search. If we wat to use the same dex structure to aswer both exact queres ad approxmate queres, we ca smply replace the dstace fucto D LB (Q,C) Le 5 of the K-NN algorthm (Table 6) by D AE (Q,C) to swtch from exact to approxmate queres ad vce-versa. Sce D AE (Q,C) s a tghter approxmato of D(Q,C) tha D LB (Q,C), the K-NN algorthm would eed to retreve fewer APCA pots from the dex before the algorthm stops. Ths would result fewer dsk accesses to retreve the full tme seres correspodg to the retreved APCA pots (Le of Table 6), leadg to lower query cost. Sce the approxmate dstace D AE (Q,C) betwee a tme seres query Q = {q, q, q } ad a APCA pot C = {cv, cr,, cv M, cr M } almost always lower bouds the Eucldea dstace D(Q,C) betwee Q ad the orgal tme seres C = {c, c,, c } (see Fgure 8), the approxmate fucto ca be used to get reasoably accurate results more effcetly usg the same dex structure. If a dex s used exclusvely for approxmate search based o D AE, further optmzatos are possble. For such a dex, we ca costruct the MBs as defed Defto 4..e. by sertg the APCA pot C = {cv, cr,, cv M, cr M } tself stead of the correspodg rectagle ({cm, cr,, cm M, cr M }, { cmax, cr,, cmax M, cr M }). The MINDIST computato s the same as the exact case. It ca be show that MINDIST(Q,) of the query from the above MB (Defto 4.) lower bouds D AE (Q,C), therefore esurg retreval of APCA pots the order of ther dstaces D AE (Q,C). Sce these MBs are always smaller tha the MBs Defto 4., the MINDISTs wll be larger resultg fewer ode accesses of the dex structure compared to approxmate search usg the same dex as the exact search ad hece eve better performace. To explot ths optmzato, oe ca mata two separate dces (oe wth MBs as defed Defto 4. ad oe wth that defed Defto 4.) for exact ad approxmate searches respectvely. 5. Expermetal evaluato I ths secto, we preset the results of a extesve emprcal study we have coducted to () evaluate APCA terms of cost of computato () compare APCA wth other dmesoalty reducto techques terms of prug power ad query respose tmes ad (3) ascerta the ablty of APCA to support approxmate search. The major fdgs of our study ca be summarzed as follows: () Fast Computato: APCA ca be computed effcetly ad s hece tractable for dexg. () Hgh Prug Power: APCA has sgfcatly hgher prug power compared to DFT, DWT, PAA ad FastMap.e., fewer false alarms. (3) Low Query Cost: APCA outperforms other dexg techques, amely DFT, DWT ad lear sca, terms of query cost, ofte by oe to two orders of magtude. (4) Fast Approxmate Search: APCA ca support fast approxmate search wth a hgh level of accuracy ( 8%). Thus, our expermetal results valdate the thess of the paper that APCA s a effectve dmesoalty reducto techque for tme seres databases. 5. Expermet methodology We expermetally compare all the state of the art dexg techques wth our proposed method. We have take great care to create hgh qualty mplemetatos of all competg techques as dscussed detal below.

() SVD: Each tme seres of legth (-dmesoal pot) s reduced to a pot a N- dmesoal space usg Sgular Value Decomposto [4]. () FastMap: Each tme seres of legth s mapped to a pot a N-dmesoal space usg FastMap as proposed [5]. (3) DFT: DFT s appled dvdually to each tme seres (of legth ) as proposed []. Sce we are computg the DFT of a real sgal, the frst magary coeffcet s zero, ad because all objects our database have had ther mea value subtracted, the frst real coeffcet s also zero. These coeffcets do ot eed to be retaed, makg room for addtoal coeffcets. We further optmze the represetato by utlzg the symmetrc propertes of the DFT as suggested [39],.e., we ca smply double the magtude of the d to (N+) th real coeffcets ad use t as the N-dmesoal pot. (4) DWT: Each tme seres (of legth ) s dvdually decomposed usg the Haar wavelet decomposto as proposed [9]. Sce the objects have zero mea, the frst Haar coeffcet s always zero. We use the magtudes of the d to (N+) th Haar coeffcets to costruct the N-dmesoal pot. (5) PAA (also kow as Segmeted Meas [5]): The N-dmesoal represetato s computed by dvdg the tme seres (of legth ) to N-equal legth segmets ad recordg the meas of those segmets as proposed [4] ad [5]. (6) APCA: The N-dmesoal represetato s computed by rug Compute_APCA(C, N/ ) o each tme seres C (of legth ). We performed tests over a rage of orgal dmesoaltes () varyg from 56 to 4 ad reduced dmesoaltes (N) varyg from 6 to. We used two datasets, oe chose because t s very heterogeeous ad the other because t s very homogeous. Homogeous Data: Electrocardogram. Ths dataset s take from the MIT esearch esource for Complex Physologc Sgals [3]. It s a relatvely clea ad ucomplcated electrocardogram. We geerated data of 3 dfferet dmesoaltes: =4, =5 ad =56. 4 I each case, the dataset cossted of, pots/sequeces. Heterogeeous Data: Mxed Bag. Ths dataset we created by combg 7 datasets wth wdely varyg propertes of shape, structure, ose etc. The oly preprocessg performed was to sure that each tme seres had a mea of zero ad a stadard devato of oe (otherwse may queres become pathologcally easy). The 7 datasets are, Space Shuttle STS- 57 [7, 5], Arrhythma [3], adom Walk [46, 34, 5, 4], INTEBALL Plasma processes (fgure 4) [43], Astrophyscal data (fgure ) [47], Pseudo Perodc Sythetc Tme Seres [4]. Exchage rate (fgure 4) [47]. Oce aga, we geerated data of 3 dfferet dmesoaltes: =4, =5 ad =56 ad each case, the dataset cossted of, pots. To perform realstc testg we eed queres that do ot have exact matches the database but have smlar propertes of shape, structure, spectral sgature, varace etc. To acheve ths we used cross valdato. We removed % of the dataset, ad buld the dex wth the remag 9%. The queres are the radomly take from the wthheld subsecto. For each result reported for a partcular dmesoalty ad query legth, we averaged the results of 5 expermets. 4 Because we wated to clude the DWT our expermets, we chose to be a teger power of two. We cosder a legth of 4 to be the logest query lkely to be ecoutered (by aalogy, oe mght query a text database wth a word, a phrase or a complete setece, but the would be lttle utlty a paragraphlegth text query. A tme seres query of legth 4 correspods approxmately wth setece legth text query).

For smplcty we oly show results for earest eghbor queres, however we obtaed smlar results for rage queres. 5. Expermetal results: Computg the dmesoalty reduced represetato We beg our expermets by measurg the tme take to compute the reduced dmesoalty represetato for each of the suggested approaches. We dd ths for query legths () from 3 to 4 ad database szes of 4KB (, objects) to KB (3, objects). The relatvely small databases were ecessary to clude SVD the expermets. We used a Petum PC 4 wth 56 megs of ram. Expermetal rus requrg more tha, secods were abadoed as dcated by the black-topped hstogram bars Fgure. SVD DFT DWT APCA PAA, 5 K 3K 6K 8K 4K 3 8 56 5 4 K 3K 6K 8K 4K 3 K 5 4 3K 6K 56 8 8K 4K 3 K K 4 3K 4 3K 5 6K 5 6K 56 8 8K 56 8 8K 4K 3 4K 3 8 56 5 4 Fgure : The tme take ( secods) to compute the reduced represetato usg varous techques over a rage of query legths ( varyg from 3 to 4) ad database szes (S varyg from 4KB (, objects) to KB (3, objects)). The black topped hstogram bars dcate that a expermetal ru was abadoed at, secods. We ca see that SVD, beg O(S ), s smply tractable for eve moderately szed databases wth hgh query legth. We extrapolated from these expermets that t would take several moths of CPU tme to clude SVD all the expermets ths paper. For ths reaso we shall exclude SVD from the rest of the expermets ( Secto 6 we wll dscuss more reasos why SVD s ot a practcal approach). The results for DWT ad APCA are vrtually dstgushable, whch s to be expected gve that the algorthm used to create the APCA speds most of ts tme a subroute call to the DWT. The ma cocluso of ths expermet s that APCA s tractable for dexg. 5.3 Expermetal results: Prug power I comparg the four competg techques (DFT, DWT, APCA ad PAA) there exsts a dager of mplemetato bas. That s, coscously or ucoscously mplemetg the code such that some approach s favored. As a example of the potetal for mplemetato bas ths work cosder the followg. At query tme DFT must do a Fourer trasform of the query. We could use the aïve algorthm whch s O( ) or the faster radx- algorthm (paddg the query wth zeros for teger ) whch s O(log). If we mplemeted the smple algorthm t would make the other dexg methods appear to perform better relatve to DFT. Whle we do preset detaled expermetal evaluato of a mplemeted system the ext secto, we also preset expermets ths secto whch are free of the possblty of mplemetato bass. We acheve ths by comparg the prug power of the varous approaches. 5 To compare the prug power of the four techques uder cosderato we measure P, the fracto of the database that must be examed before we ca guaratee that we have foud the earest match to a -NN query. Number of examed P = objects that must be (3) Number of objects database To calculate P we do the followg. adom queres are geerated (as descrbed above). 5 We also clude FastMap [5] (alog wth DFT, DWT, APCA ad PAA) our prug power expermets for completeess. 3

Objects the database are examed order of creasg (feature space) dstace from the query utl the dstace feature space of the ext uexamed object s greater tha mmum actual dstace of the best match so far. The umber of objects examed at ths pot s the absolute mmum order to guaratee o false dsmssals. Note the value of P for ay trasformato depeds oly o the data ad s completely depedet of ay mplemetato choces, cludg spatal access method, page sze, computer laguage or hardware. A smlar dea for evaluatg dexg schemes appears [8]. Fgure 3 shows the value of P over a rage of query legths ad dmesoaltes for the expermets that were coducted the Mxed Bag dataset. FastMap.5.4.3.. 4 5 56 DFT.5.4.3.. 6 4 3 5 56 DWT Fgure 3: The fracto P, of the Mxed Bag database that must be examed by the fve dmesoalty reducto techques beg compared, over a rage of orgal dmesoaltes ( varyg from 56 to 4) ad reduced dmesoaltes (N varyg from 6 to ). To preserve a meagful scale for the graphs, we trucated (poor) values at a certa threshold, the trucated values are show as black-topped hstogram bars. Note that the results for PAA ad DWT are detcal. Ths because the prug power of DWT ad PAA are detcal whe N = teger [4]. Havg emprcally show ths fact whch was proved [4, 5] we have excluded PAA from future expermets for clarty. We repeated the expermet for the Electrocardogram data, the results are show Fgure 4. PAA APCA.5.5.5.4.4.4.3.3.3...... 4 4 6 4 6 5 5 3 56 56 5 3 6 6 3 3 56 FastMap DFT DWT APCA.3.3.3.3........ 4 5 56 3 6 4 5 56 3 6 4 5 56 3 6 4 5 56 3 6 Fgure 4: The fracto P, of the Electrocardogram database that must be examed by the three dmesoalty reducto techques beg compared over a rage of orgal dmesoaltes ( varyg from 56 to 4) ad reduced dmesoaltes (N varyg from 6 to ). As Fgure 3, we trucated (poor) values at a certa threshold, the trucated values are show as black-topped hstogram bars. As show Fgures 3 ad 4, FastMap has sgfcatly lower prug power compared to the other techques. Ths ca be explaed by the way FastMap works: whle other approaches take the hgh-dmesoal represetato of the tme seres ad project them to a N- dmesoal space, FastMap attempts to model the dataset a N-dmesoal space such that the dstaces betwee objects are approxmately preserved. That s to say, FastMap cosders oly the dstaces betwee the sequeces, ad dsregards the actual shape of the sequeces. Ths attrbute of FastMap makes t a deal techque for stuatos where the objects to be dexed have o atural features, but there exsts a method to compute the dstaces betwee them (e.g., dexg strgs uder edt dstace or tme seres uder dyamc tme warpg [5]). I the case where hgh qualty atural features ca be extracted, FastMap s outperformed by techques (lke DFT, DWT ad APCA) that use those features to obta the reduced represetato. Havg show ths the prug power expermets, we exclude FastMap from further expermets for clarty. 6 6 Note that for faress we dd ot clude FastMap our expermets o the tme requred to buld compute the reduced represetato (show Fgure ). Ths s because ts mplemetato requres several user-defed 4

I both Fgure 3 ad 4 we ca see that APCA outperforms DFT ad DWT sgfcatly, geerally by a order of magtude. These expermets dcate that the APCA techque has fewer false alarms, hece lower query cost as cofrmed by the expermets below. 5.4 Expermetal results: Implemeted system Although the prug power expermets are powerful predctors of the (relatve) performace of dexg systems usg the varous dmesoalty reducto schemes, we clude a comparso of mplemeted systems for completeess. We mplemeted four dexg techques: lear sca, DFT-dex, DWT-dex ad APCA-dex. We compare the four techques terms of the I/O ad CPU costs curred to retreve the exact earest eghbor of a query tme seres. All the expermets reported ths subsecto were coducted o a Su Ultra Eterprse 45 mache wth 4 96MHz CPUs, GB of physcal memory ad several GB of secodary storage, rug Solars.6. Cost Measuremets: We measured the I/O ad CPU costs of the four techques as follows: () Lear Sca (LS): I ths techque, we perform a smple lear sca o the orgal - dmesoal dataset ad determe the exact earest eghbor of the query. The I/O cost terms of sequetal dsk accesses s (S*(*szeof(float) + szeof(d)))/(pagesze). Sce szeof(d) << (*szeof(float)), we wll gore the szeof(d) heceforth. Assumg sequetal I/O s about tmes faster tha radom I/O, the cost terms of radom accesses s (S*szeof(float)*)/(PageSze*). The CPU cost s the cost of computg the dstace D(Q,C) of the query Q from each tme seres C = {c,, c } the database. () DFT-dex (DFT): I ths techque, we reduce the dmesoalty of the data from to N usg DFT ad buld a dex o the reduced space usg a multdmesoal dex structure. We use the hybrd tree as the dex structure. The I/O cost of a query has two compoets: () the cost of accessg the odes of the dex structure ad () the cost of accessg the pages to retreve the full tme seres from the database for each dexed tem retreved (cf. Table 6). For the secod compoet, we assume that a full tme seres access costs oe radom dsk access. The total I/O cost ( terms of radom dsk accesses) s the umber of dex odes accessed plus the umber of dexed tems retreved by the K-NN algorthm before the algorthm stopped (.e. before the dstace of the ext uexamed object the dexed space s greater tha the mmum of the actual dstaces of tems retreved so far). The CPU cost also has two compoets: () the CPU tme (excludg the I/O wat) take by the K-NN algorthm to avgate the dex ad retreve the dexed tems ad () the CPU tme to compute the exact dstace D(Q,C) of the query Q from the orgal tme seres C of each dexed tem C retreved (Le Table 6). The total CPU cost s the sum of the two costs. (3) DWT-dex (DWT): I ths techque, we reduce the dmesoalty of the data from to N usg DWT ad buld the dex o the reduced space usg the hybrd tree dex structure. The I/O ad CPU costs are computed the same way as DFT. (4) APCA-dex (APCA): I ths techque, we reduce the dmesoalty of the data from to N usg APCA ad buld the dex o the reduced space usg the hybrd tree dex structure. The I/O ad CPU costs are computed the same way as DFT ad DWT. We chose the hybrd tree as the dex structure for our expermets sce t s a space parameters ad we could ot guaratee our mplemetato was the most effcet. We ote that ts tme complexty s O(SN) whle that of APCA s O(Slog()). 5

parttog dex structure ( dmesoalty-depedet faout) ad has bee show to scale to hgh dmesoaltes [6, 37, 4]. Sce we had access to the source code of the dex structure (http://www-db.cs.uc.edu) we mplemeted the optmzato dscussed Secto 4 (.e. to crease leaf ode faout) for our expermets. We used a page sze of 4KB for all our expermets. Dataset: We used the Electrocardogram (ECG) database for these expermets. We created 3 datasets from the ECG database by choosg 3 dfferet values of query legth (56, 5 ad 4). For each dataset, we reduced the dmesoalty to N = 6, N = 3 ad N = usg each of the 3 dmesoalty reducto techques (DFT, DWT ad APCA) ad bult the hybrd tree dces o the reduced spaces (resultg a total of 9 dces for each techque). As metoed before, the queres were chose radomly from the wthheld secto of the dataset. All our measuremets are averaged over 5 queres. 35, LS DFT DWT APCA 4 5 56 3 6 4 5 56 3 6 Fgure 5: Comparso of LS, DFT, DWT ad APCA techques terms of I/O cost (umber of radom dsk accesses) over a rage of orgal dmesoaltes ( varyg from 56 to 4) ad reduced dmesoaltes (N varyg from 6 to ). For LS, the cost s computed as umber_sequetal_dsk_accesses/. We used the ECG dataset for ths experemet. Fgure 5 compares the LS, DFT, DWT ad APCA techques terms of I/O cost (measured by the umber of radom dsk accesses) for the 3 datasets ( = 56, 5 ad 4) ad 3 dfferet dmesoaltes of the dex (N = 6, 3 ad ). The APCA techque sgfcatly outperforms the other 3 techques terms of I/O cost. The LS techque suffers due to the large database sze (e.g.,, sequetal dsk accesses for = 4 whch s equvalet to, radom dsk accesses). Although LS s ot cosderably worse tha APCA terms of I/O cost, t s sgfcatly worse terms of the overall cost due to ts hgh CPU cost compoet (see Fgure 6). The DFT ad DWT suffer maly due to low prug power (cf. Fgure 4). Sce DFT ad DWT retreve a large umber of dexed tems before t ca guarateed that the exact earest eghbor s amog the retreved tems, the secod compoet of the I/O cost (that of retrevg full tme seres from the database) teds to be hgh. The DFT ad DWT costs are the hghest for large ad small N (e.g., = 4, N=6) as the prug power s the lowest for those values (cf. Fgure 4). The DWT techque shows a U-shaped curve for = 4: whe the reduced dmesoalty s low (N = 6), the secod compoet of the I/O cost s hgh due to low prug power, whle whe N s hgh (N = ), the frst compoet of the I/O cost (dex ode accesses) becomes large due to dmesoalty curse. We dd ot observe such U-shaped behavor the other techques as ther costs were ether domated etrely by the frst compoet (e.g., = 56 ad = 5 cases of APCA) or by the secod compoet (all of DFT ad = 4 case of APCA). Fgure 6 compares the LS, DFT, DWT ad APCA techques terms of CPU cost (measured secods) for the 3 datasets ( = 56, 5 ad 4) ad 3 dfferet dmesoaltes of the dex (N = 6, 3 ad ). Oce aga, the APCA techque sgfcatly outperforms the other 3 techques terms of CPU cost. The LS techque s the worst terms of CPU cost as t computes the exact (-dmesoal) dstace D(Q,C) of the query Q from every tme seres C the database. The DFT ad DWT techques suffer aga due to ther low prug power (cf. Fgure 4), causg the secod compoet of the CPU cost (.e. the tme to compute the exact 4 5 56 3 6 4 5 56 3 6 6

LS DFT DWT APCA 8 6 4 4 5 56 3 6 4 5 56 3 6 4 5 56 3 6 4 5 56 3 6 Fgure 6: Comparso of LS, DFT, DWT ad APCA techques terms of CPU cost (secods) over a rage of orgal dmesoaltes ( varyg from 56 to 4) ad reduced dmesoaltes (N varyg from 6 to ). We used the ECG dataset for ths experemet. dstaces D(Q,C) of the orgal tme seres of the retreved APCA pots from the query) to become hgh. 5.5 Expermetal results: Scalablty Expermets I the prevous subsecto, we reported the I/O ad CPU costs stead of the actual wall clock tmes requred to aswer the queres. We dd that for two reasos: frst, t gves us formato about the two compoets of the cost dvdually eablg us to do a better comparso ad secod, the wall clock tmes would have bee msleadg as t would ot have cluded the I/O cost compoet at all (because the dex always ft ma memory (GB of physcal memory) ad there was o actual I/O volved). To measure wall clock tmes, we ra expermets where the dex dd ot ft ma memory (by usg larger datasets (upto 5, objects) ad rug them o a mache wth less physcal memory); we report those results ths subsecto. We compare the four dexg techques, amely, lear sca (LS), DFT-dex (DFT), DWT-dex (DWT) ad APCA-dex (APCA), terms of the wall clock tme (elapsed tme) requred to retreve the exact earest eghbor of a query tme seres. The wall clock tme cludes both the I/O wat tme ad the compute tme (CPU cost). As before, we used the hybrd tree dex structure for our expermets, the page sze was 4KB. Due to uavalablty of larger real-lfe datasets, we geerated, ad 5, object databases from the, object ECG database by slghtly perturbg the datapots the sequece to geerate the eghborg datapots. For ths expermet, we fxed the orgal dmesoalty to =4,.e., we geerated 3 datasets cotag,,, ad 5, 4-dmesoal pots respectvely. We the reduced the dmesoalty to N= usg each of the 3 dmesoalty Query espose Tme (secods) 9 8 7 6 5 4 3 3 4 5 6 #objects database APCA DFT DWT LS Query espose Tme (secods) 4 35 3 5 5 5 3 4 5 6 #objects database APCA DFT DWT (a) Fgure 7: (a) Comparso of LS, DFT, DWT ad APCA techques terms of query respose tme (wall clock tme secods) wth database sze varyg from, objects to 5, objects (=4, N=). (b) Same as (a) showg just DFT, DWT ad APCA for better comparso. We sythetcally geerated the dataset from the ECG dataset. (b) 7

reducto techques ad bult hybrd tree dces o the reduced space. The queres were chose radomly as dscussed before ad the measuremets are averaged over 5 queres. The expermets were coducted o a mache Su Ultra mache wth 68MHz CPUs ad 56MB of physcal memory. Fgure 7 compares the LS, DFT, DWT ad APCA techques terms of wall clock tme requred to retreve the exact earest eghbor of a query tme seres. As show Fgure 7(a), LS s much slower compared to other 3 techques maly due to ts hgh computatoal cost (as t computes the exact 4-dmesoal dstace D(Q,C) of the query Q from every tme seres C the database). Sce the plots of APCA, DFT ad DWT are very close to each other 7(a), we replot them Fgure 7(b) for a better comparso. As before, APCA sgfcatly outperforms the other techques. DFT ad DWT suffer due to ther low prug power leadg to loger I/O wat tmes as well as more dstace computatos ad hece hgher wall clock tmes. 5.6 Expermetal results: Approxmate queres I ths secto we evaluate the ablty of APCA to support approxmate search. To evaluate the qualty of the retured aswer set we used the precso cocept from formato retreval [5]. relevat ad retreved precso (4) retreved The deal value precso s, dcatg that all the tems retreved are relevat. We compared D AE to D LB o the Mxed Bag dataset for K-NN queres. We use a combato of log queres, = 4 ad few dmesos, N = 6, because ths s whe all techques have the worst query respose tme. For smplcty, we oly cosder the secod compoet of the I/O cost (.e. that of retrevg the full tme seres from the database (Le Table 6)). Ths s reasoable sce for = 4 ad N = 6, the secod compoet domates the total I/O cost (cf. Fgure 5). For D AE we adopted the followg smple protocol. We retreved 3K objects from the database, the used the true Eucldea dstace to prue away K objects. We the measured the precso of the remag K tems. Here the precso s smply the fracto of tems that D AE reported as beg the set of the top K eghbors, whch actually belog that set. The results for each value of K are averaged over 5 rus. The results are reported Table 8. K Dsk Accesses D LB Dsk Accesses D AE Precso 5 3,.3 5.8 3,74.3 3.84 3,4. 6.8 Table 8: The precso of D AE for varous values of K, wth the umber of dsk accesses requred by both D AE (approxmate search) ad D LB (exact search.). We ca see that D AE s useful for very fast queres that gve approxmately the same aswer set as exact search. Ths feature s very useful teractve exploratory aalyss of massve datasets. 6. Dscusso Now that the reader s more famlar wth the cotrbuto of ths paper we wll brefly revst related work. We beleve that ths paper s the frst to suggest locally adaptve dexg tme seres dexg. A locally adaptve represetato for -dmesoal shapes was suggested [8] but o dexg techque was proposed. Also the cotext of mages, t was oted by [5] that the use of the frst N Fourer coeffcets does ot guaratee the optmal prug power. They troduced a techque where they adaptvely choose whch coeffcets to keep after lookg at the data. However, the choce of coeffcets was based upo a global vew of the data. Later 8

work [49] the cotext of tme seres oted that the polcy of usg the frst N wavelet coeffcets [9, 49, ] s ot geerally optmal, but keepg the largest coeffcets eeds addtoal dexg space ad (more complex) dexg structures. Sgular value decomposto s also a data adaptve techque used for tme seres [8, 4, 3], but t s globally, ot locally, adaptve. ecet work [7] has suggested frst clusterg a mult-dmesoal space ad the dog SVD o local clusters, makg t a sem-local approach. It s ot clear however that ths approach ca be made work for tme seres. Fally a represetato smlar to APCA was troduced [5] (uder the ame pecewse flat approxmato ) but o dexg techque was suggested. 6. Other factors choosg a represetato to support dexg. Although we have expermetally demostrated that the APCA represetato s superor to other approaches terms of query respose tme, there are other factors whch oe may wsh to cosder whe choosg a represetato to support dexg. We wll brefly cosder some of these ssues here. Oe mportat ssue s the legth of queres allowed. For example the wavelet approach oly allows queres wth legths that are a teger power of two [4]. Ths problem could be addressed by havg the system pad zeros up to the ext power of two, the flter out the addtoal false hts. However ths wll severely degrade performace. The APCA approach, cotrast, allows arbtrary legth queres. Aother mportat pot to cosder are the set of dstace measures supported by a represetato. It has bee argued that for may applcatos, dstace measures other tha Eucldea dstace are requred. For example [5] the authors oted that the PAA represetato ca support queres where the dstace measure s a arbtrary Lp orm (.e. p =,,.., ). We refer the terested reader to that paper for a dscusso of the utlty of these dstace metrcs, but ote that the APCA represetato ca easly hadle such queres by trval geeralzatos of Equatos 3 ad 5 to Equatos 6 ad 7. p M cl cl D AE (Q,C) ( c qk cl ) k D LB (Q,C) = k= + p M = cl p (6) p ( q c ) (7) Note that as wth the approach of [5] we ca reuse the same dex for ay Lp orm. Almost all tme seres databases are dyamc. For example, NASA updates ts archve of Space Shuttle telemetry data after each msso. Some databases are updated cotuously, for example facal datasets are updated (at least) at the ed of each busess day. It s therefore mportat that ay dexg techque be able to support dyamc serts. Our proposed approach (alog wth DWT, DFT ad PAA) has ths property. However dyamc serto s the Achlles heel of SVD, a sgle serto requres recomputg the etre dex. Faster methods do exst for cremetal updates, but they troduce the possblty of false dsmssals []. 7. Coclusos ad drectos for future work The ma cotrbuto of ths paper s to show that a smple, ovel dmesoalty reducto techque, amely APCA, ca outperform more sophstcated trasforms by oe to two orders of magtude. I cotrast to popular belef [5, 5], we have show that the APCA represetato ca be dexed usg a multdmesoal dex structure. I addto to fast exact queres, the approach also allows eve faster approxmate queryg o the same dex structure. We have also show that our approach ca support arbtrary Lp orms, aga usg a sgle dex structure. 9

The dea of locally adaptve represetato s applcable ot just to tme-seres data but to sequece data geeral (oe-dmesoal as well as multdmesoal sequeces). For example, we appled such a represetato for -dmesoal shapes [8]. As future work, we ted to apply ths dea of locally adaptve represetato to more such applcato domas. We also ted to crease the speedup of our method eve further by explotg the smlarty of adjacet sequeces ( a smlar sprt to the "tral dexg" techque troduced [6]). Ackowledgemets We thak Chrstos Faloutsos provdg us wth the FastMap code ad helpful suggestos o the paper. efereces [] Agrawal,., Faloutsos, C., & Swam, A. (993). Effcet smlarty search sequece databases. Proceedgs of the 4 th Coferece o Foudatos of Data Orgazato ad Algorthms. [] Agrawal,., Psala, G., Wmmers, E. L., & Zat, M. (995). Queryg shapes of hstores. Proceedgs of the st Iteratoal Coferece o Very Large Databases. [3] Agrawal,., L, K. I., Sawhey, H. S., & Shm, K. (995). Fast smlarty search the presece of ose, scalg, ad traslato tmes-seres databases. Proceedgs of th Iteratoal Coferece o Very Large Data Bases. Zurch. pp 49-5. [4] Bay, S. D. (). The UCI KDD Archve [http://kdd.cs.uc.edu]. Irve, CA: Uversty of Calfora, Departmet of Iformato ad Computer Scece. [5] Beett, K., Fayyad, U. & Geger. D. (999). Desty-based dexg for approxmate earest-eghbor queres. Proceedgs 5 th Iteratoal Coferece o Kowledge Dscovery ad Data Mg. pp. 33-43, ACM Press, New York. [6] Chakrabart, K & Mehrotra, S. (999). The Hybrd Tree: A dex structure for hgh dmesoal feature spaces. Proceedgs of the 5 th IEEE Iteratoal Coferece o Data Egeerg. [7] Chakrabart, K & Mehrotra, S (). Local dmesoalty reducto: A ew approach to dexg hgh dmesoal spaces. Proceedgs of the 6 th Coferece o Very Large Databases, Caro, Egypt. [8] Chakrabart, K., Ortega-Bderberger, M., Porkaew, K & Mehrotra, S. () Smlar shape retreval MAS. Proceedg of IEEE Iteratoal Coferece o Multmeda ad Expo. [9] Cha, K. & Fu, W. (999). Effcet tme seres matchg by wavelets. Proceedgs of the 5 th IEEE Iteratoal Coferece o Data Egeerg. [] Chadrasekara, S., Majuath, B.S., Wag, Y. F. Wkeler, J. & Zhag. H. (997). A egespace update algorthm for mage aalyss. Graphcal Models ad Image Processg, Vol. 59, No. 5, pp. 3-33. [] Chu, K & Wog, M. (999). Fast tme-seres searchg wth scalg ad shftg. Proceedgs of the 8 th ACM Symposum o Prcples of Database Systems, Phladelpha. [] Das, G., L, K. Mala, H., egaatha, G., & Smyth, P. (998). ule dscovery from tme seres. Proceedgs of the 3 rd Iteratoal Coferece of Kowledge Dscovery ad Data Mg. pp 6-. [3] Debregeas, A. & Hebral, G. (998). Iteractve terpretato of Kohoe maps appled to curves. Proceedgs of the 4 th Iteratoal Coferece of Kowledge Dscovery ad Data Mg. pp 79-83. 3

[4] Evagelds, G., Lomet, D. & Salzberg B (997). The hb-p-tree: A mult-attrbute dex supportg cocurrecy, recovery ad ode cosoldato. VLDB Joural 6(): -5. [5] Faloutsos, C., Jagadsh, H., Medelzo, A. & Mlo, T. (997). A sgature techque for smlarty-based queres. SEQUENCES 97, Postao-Salero, Italy. [6] Faloutsos, C., agaatha, M., & Maolopoulos, Y. (994). Fast subsequece matchg tme-seres databases. Proceedgs of the 994 ACM SIGMOD Iteratoal Coferece o Maagemet of Data. Meapols. [7] Guttma, A. (984). -trees: A dyamc dex structure for spatal searchg. Proceedgs ACM SIGMOD Coferece. pp 47-57. [8] Hellerste, J. M., Papadmtrou, C. H., & Koutsoupas, E. (997). Towards a aalyss of dexg schemes. Sxteeth ACM Symposum o Prcples of Database Systems. [9] Hjaltaso, G., Samet, H (995). akg spatal databases. Symposum o Large Spatal Databases. pp 83-95. [] Huag, Y. W., Yu, P. (999). Adaptve Query processg for tme-seres data. Proceedgs of the 5 th Iteratoal Coferece of Kowledge Dscovery ad Data Mg. pp 8-86. [] Josso. H., & Badal. D. (997). Usg sgature fles for queryg tme-seres data. Frst Europea Symposum o Prcples of Data Mg ad Kowledge Dscovery. [] Kahvec, T. & Sgh, A (). Varable legth queres for tme seres data. Proceedgs 7 th Iteratoal Coferece o Data Egeerg. Hedelberg, Germay. [3] Kath, K.V., Agrawal, D., & Sgh, A. (998). Dmesoalty reducto for smlarty searchg dyamc databases. Proceedgs ACM SIGMOD Cof., pp. 66-76. [4] Keogh, E,. Chakrabart, K,. Pazza, M. & Mehrotra () Dmesoalty reducto for fast smlarty search large tme seres databases. Joural of Kowledge ad Iformato Systems. [5] Keogh, E. & Pazza, M. (999). elevace feedback retreval of tme seres data. Proceedgs of the th Aual Iteratoal ACM-SIGI Coferece o esearch ad Developmet Iformato etreval. [6] Keogh, E., & Pazza, M. (998). A ehaced represetato of tme seres whch allows fast ad accurate classfcato, clusterg ad relevace feedback. Proceedgs of the 4 th Iteratoal Coferece of Kowledge Dscovery ad Data Mg. pp 39-4, AAAI Press. [7] Keogh, E., & Smyth, P. (997). A probablstc approach to fast patter matchg tme seres databases. Proceedgs of the 3 rd Iteratoal Coferece of Kowledge Dscovery ad Data Mg. pp 4-. [8] Kor, F., Jagadsh, H & Faloutsos. C. (997). Effcetly supportg ad hoc queres large datasets of tme sequeces. Proceedgs of SIGMOD '97, Tucso, AZ, pp 89-3. [9] Lam, S., & Wog, M (998) A fast projecto algorthm for sequece data searchg. Data & Kowledge Egeerg 8(3): 3-339. [3] L, C,. Yu, P. & Castell V.(998). MALM: A framework for mg sequece database at multple abstracto levels. CIKM. pp 67-7. [3] Loh, W., Km, S & Whag, K. (). Idex terpolato: a approach to subsequece matchg supportg ormalzato trasform tme-seres databases. Proceedgs 9 th Iteratoal Coferece o Iformato ad Kowledge Maagemet. [3] Moody, G. (). MIT-BIH Database Dstrbuto [http://ecg.mt.edu/dex.html]. Cambrdge, MA. [33] Ng, M. K., Huag, Z., & Heglad, M. (998). Data-mg massve tme seres astroomcal data sets - a case study. Proceedgs of the d Pacfc-Asa Coferece o Kowledge Dscovery ad Data Mg. pp 4-4 3

[34] Park, S., Lee, D., & Chu, W. (999). Fast retreval of smlar subsequeces log sequece databases. I 3 rd IEEE Kowledge ad Data Egeerg Exchage Workshop. [35] Pavlds, T. (976). Waveform segmetato through fuctoal approxmato. IEEE Trascatos o Computers, Vol C-, NO. 7 July. [36] Perg, C., Wag, H., Zhag, S., & Parker, S. (). Ladmarks: a ew model for smlartybased patter queryg tme seres databases. Proceedgs 6 th Iteratoal Coferece o Data Egeerg. Sa Dego, USA. [37] Porkaew, K., Chakrabart, K. & Mehrotra, S. (999). Query refemet for multmeda smlarty retreval MAS. Proceedgs of the ACM Iteratoal Multmeda Coferece, Orlado, Florda, pp 35-38 [38] Qu, Y., Wag, C. & Wag, S. (998). Supportg fast search tme seres for movemet patters multples scales. Proceedgs 7 th Iteratoal Coferece o Iformato ad Kowledge Maagemet. Washgto, DC. [39] efe, D. (999). O smlarty-based queres for tme seres data. Proc of the 5th IEEE Iteratoal Coferece o Data Egeerg. Sydey, Australa. [4] oussopoulos, N., Kelley, S. & Vcet, F. (995). Nearest eghbor queres. SIGMOD Coferece 995: 7-79. [4] Sedl, T. & Kregel, H. (998). Optmal mult-step k-earest eghbor search. SIGMOD Coferece: pp 54-65. [4] Shatkay, H., & Zdok, S. (996). Approxmate queres ad represetatos for large data sequeces. Proceedgs th IEEE Iteratoal Coferece o Data Egeerg. pp 546-553. [43] Shevcheko, M. (). [http://www.k.rss.ru/] Space esearch Isttute. Moscow, ussa. [44] Stolltz, E., Deose, T., & Sales, D. (995). Wavelets for computer graphcs A prmer: IEEE Computer Graphcs ad Applcatos. [45] Struzk, Z. & Sebes, A. (999). The Haar wavelet trasform the tme seres smlarty paradgm. Proceedgs 3 rd Europea Coferece o Prcples ad Practce of Kowledge Dscovery Databases. pp -. [46] Wag, C. & Wag, S. (). Supportg cotet-based searches o tme Seres va approxmato. Iteratoal Coferece o Scetfc ad Statstcal Database Maagemet. [47] Weged, A. (994). The Sata Fe Tme Seres Competto Data [http://www.ster.yu.edu/~aweged/tme-seres/satafe.html] [48] Welch. D. & Qu. P (999). http://wwwmacho.mcmaster.ca/project/overvew/status.html [49] Wu, Y., Agrawal, D. & Abbad, A.(). A Comparso of DFT ad DWT based Smlarty Search Tme-Seres Databases. Proceedgs of the 9 th Iteratoal Coferece o Iformato ad Kowledge Maagemet. [5] Wu, D., Agrawal, D., El Abbad, A. Sgh, A. & Smth, T.. (996). Effcet retreval for browsg large mage databases. Proc of the 5 th Iteratoal Coferece o Kowledge Iformato. pp -8, ockvlle, MD. [5] Y, B,K., Jagadsh, H., & Faloutsos, C. (998). Effcet retreval of smlar tme sequeces uder tme warpg. IEEEE Iteratoal Coferece o Data Egeerg. pp -8. [5] Y, B,K., & Faloutsos, C.(). Fast tme sequece dexg for arbtrary Lp orms. Proceedgs of the 6 st Iteratoal Coferece o Very Large Databases, Caro, Egypt. 3