Experiencing SAX: a Novel Symbolic Representation of Time Series

Size: px
Start display at page:

Download "Experiencing SAX: a Novel Symbolic Representation of Time Series"

Transcription

1 Expereg SAX: a Novel Symbol Represetato of Tme Seres JESSIA LIN jessa@se.gmu.edu Iformato ad Software Egeerg Departmet, George Maso Uversty, Farfax, VA 030 EAMONN KEOGH eamo@s.ur.edu LI WEI wl@s.ur.edu STEFANO LONARDI stelo@s.ur.edu omputer See & Egeerg Departmet, Uversty of alfora Rversde, Rversde, A 95 Abstrat. May hgh level represetatos of tme seres have bee proposed for data mg, ludg Fourer trasforms, wavelets, egewaves, peewse polyomal models et. May researhers have also osdered symbol represetatos of tme seres, otg that suh represetatos would potetalty allow researhers to aval of the wealth of data strutures ad algorthms from the text proessg ad boformats ommutes. Whle may symbol represetatos of tme seres have bee trodued over the past deades, they all suffer from two fatal flaws. Frstly, the dmesoalty of the symbol represetato s the same as the orgal data, ad vrtually all data mg algorthms sale poorly wth dmesoalty. Seodly, although dstae measures a be defed o the symbol approahes, these dstae measures have lttle orrelato wth dstae measures defed o the orgal tme seres. I ths work we formulate a ew symbol represetato of tme seres. Our represetato s uque that t allows dmesoalty/umerosty reduto, ad t also allows dstae measures to be defed o the symbol approah that lower boud orrespodg dstae measures defed o the orgal seres. As we shall demostrate, ths latter feature s partularly extg beause t allows oe to ru erta data mg algorthms o the effetly mapulated symbol represetato, whle produg detal results to the algorthms that operate o the orgal data. I partular, we wll demostrate the utlty of our represetato o varous data mg tasks of lusterg, lassfato, query by otet, aomaly deteto, motf dsovery, ad vsualzato. Keywords Tme Seres, Data Mg, Symbol Represetato, Dsretze. Itroduto May hgh level represetatos of tme seres have bee proposed for data mg. Fgure llustrates a herarhy of all the varous tme seres represetatos the lterature [3,, 0, 4, 7, 30, 36, 53, 54, 63]. Oe represetato that the data mg ommuty has ot osdered detal s the dsretzato of the orgal data to symbol strgs. At frst glae ths seems a surprsg oversght. There s a eormous wealth of exstg algorthms ad data strutures that allow the effet mapulatos of strgs. Suh algorthms have reeved deades of atteto the text retreval ommuty, ad more reet atteto from the boformats ommuty [5, 9, 5, 5, 57, 60]. Some smple examples of tools that are ot defed for real-valued sequees but are defed for symbol approahes lude hashg, Markov models, suffx trees, deso trees et. There s, however, a smple explaato for the data mg ommuty s lak of terest strg mapulato as a supportg tehque for mg tme seres. If the data are trasformed to vrtually ay of the other represetatos depted Fgure, the t s possble to measure the smlarty of two tme seres that represetato spae, suh that the dstae s guarateed to lower boud the true dstae betwee the tme seres the orgal spae. Ths smple fat s at the ore of almost all algorthms tme The exeptos are radom mappgs, whh are oly guarateed to be wth a epslo of the true dstae wth a erta probablty, trees, terpolato ad atural laguage.

2 seres data mg ad dexg [0]. However, spte of the fat that there are dozes of tehques for produg dfferet varats of the symbol represetato [3, 5, 7], there s o kow method for alulatg the dstae the symbol spae, whle provdg the lower boudg guaratee. I addto to allowg the reato of lower boudg dstae measures, there s oe other hghly desrable property of ay tme seres represetato, ludg a symbol oe. Almost all tme seres datasets are very hgh dmesoal. Ths s a hallegg fat beause all o-trval data mg ad dexg algorthms degrade expoetally wth dmesoalty. For example, above 6-0 dmesos, dex strutures degrade to sequetal sag [6]. Noe of the symbol represetatos that we are aware of allow dmesoalty reduto [3, 5, 7]. There s some reduto the storage spae requred, se fewer bts are requred for eah value, however the trs dmesoalty of the symbol represetato s the same as the orgal data. Data Adaptve Tme Seres Represetatos No Data Adaptve Sorted oeffets Peewse Lear Approxmato Iterpolato Regresso Peewse Polyomal Sgular Value Deomposto Adaptve Peewse ostat Approxma to Symbol Natural Laguage Lower Boudg Strgs Trees No- Lower Boudg Haar Orthoormal Daubehes db > Wavelets Radom Mappgs B-Orthoormal oflets Symlets Spetral Dsrete Fourer Trasform Peewse Aggregate Approxmato Dsrete ose Trasform Fgure : A herarhy of all the varous tme seres represetatos the lterature. The leaf odes refer to the atual represetato, ad the teral odes refer to the lassfato of the approah. The otrbuto of ths paper s to trodue a ew represetato, the lower boudg symbol approah There s o doubt that a ew symbol represetato that remedes all the problems metoed above would be hghly desrable. More spefally, the symbol represetato should meet the followg rtera: spae effey, tme effey fast dexg, ad orretess of aswer sets o false dsmssals. I ths work we formally formulate a ovel symbol represetato ad show ts utlty o other tme seres tasks. Our represetato s uque that t allows dmesoalty/umerosty reduto, ad t also allows dstae measures to be defed o the symbol represetato that lower boud orrespodg popular dstae measures defed o the orgal data. As we shall demostrate, the latter feature s partularly extg beause t allows oe to ru erta data mg algorthms o the effetly mapulated symbol represetato, whle produg detal results to the algorthms that operate o the orgal data. I partular, we wll demostrate the utlty of our represetato o the lass data mg tasks of lusterg [9], lassfato [4], dexg [, 0, 30, 63], ad aomaly deteto [4, 34, 54]. The rest of ths paper s orgazed as follows. Seto brefly dsusses bakgroud materal o tme seres data mg ad related work. Seto 3 trodues our ovel symbol approah, ad dsusses ts dmesoalty reduto, umerosty reduto ad lower boudg abltes. Seto 4 otas a expermetal evaluato of the symbol approah o a varety of data mg tasks. Impat of the symbol approah s also dsussed. Fally, Seto 5 offers some olusos ad suggestos for future work.. Bakgroud ad Related Work Tme seres data mg has attrated eormous atteto the last deade. The revew below s eessarly bref; we refer terested readers to [3, 53] for a more depth revew. A prelmary verso of ths paper appears [4].

3 . Tme Seres Data Mg Tasks Whle makg o pretee to be exhaustve, the followg lst summarzes the areas that have see the majorty of researh terest tme seres data mg. Idexg: Gve a query tme seres, ad some smlarty/dssmlarty measure D,, fd the most smlar tme seres database DB [,, 0, 30, 63]. lusterg: Fd atural groupgs of the tme seres database DB uder some smlarty/dssmlarty measure D, [9, 36]. lassfato: Gve a ulabeled tme seres, assg t to oe of two or more predefed lasses [4]. Summarzato: Gve a tme seres otag datapots where s a extremely large umber, reate a possbly graph approxmato of whh retas ts essetal features but fts o a sgle page, omputer sree, exeutve summary et [43]. Aomaly Deteto: Gve a tme seres, ad some model of ormal behavor, fd all setos of whh ota aomales or surprsg/terestg/uexpeted/ovel behavor [4, 34, 54]. Se the datasets eoutered by data mers typally do t ft ma memory, ad dsk I/O teds to be the bottleek for ay data mg task, a smple geer framework for tme seres data mg has emerged [0]. The bas approah s outled Table. Table : A geer tme seres data mg approah.. 3. reate a approxmato of the data, whh wll ft ma memory, yet retas the essetal features of terest. Approxmately solve the task at had ma memory. Make hopefully very few aesses to the orgal data o dsk to ofrm the soluto obtaed Step, or to modfy the soluto so t agrees wth the soluto we would have obtaed o the orgal data. It should be lear that the utlty of ths framework depeds heavly o the qualty of the approxmato reated Step. If the approxmato s very fathful to the orgal data, the the soluto obtaed ma memory s lkely to be the same as, or very lose to, the soluto we would have obtaed o the orgal data. The hadful of dsk aesses made Step 3 to ofrm or slghtly modfy the soluto wll be osequetal ompared to the umber of dsk aesses requred had we worked o the orgal data. Wth ths md, there has bee great terest approxmate represetatos of tme seres, whh we osder below.. Tme Seres Represetatos As wth most problems omputer see, the sutable hoe of represetato greatly affets the ease ad effey of tme seres data mg. Wth ths md, a great umber of tme seres represetatos have bee trodued, ludg the Dsrete Fourer Trasform DFT [0], the Dsrete Wavelet Trasform DWT [], Peewse Lear, ad Peewse ostat models PAA [30], APA [4, 30], ad Sgular Value Deomposto SVD [30]. Fgure llustrates the most ommoly used represetatos. Reet work suggests that there s lttle to hoose betwee the above terms of dexg power [3], however, the represetatos have other features that may at as stregths or weakesses. As a smple example, wavelets have the useful multresoluto property, but are oly defed for tme seres that are a teger power of two legth []. Oe mportat feature of all the above represetatos s that they are real valued. Ths lmts the algorthms, data strutures ad deftos avalable for them. For example, aomaly deteto we aot meagfully defe the probablty of observg ay partular set of wavelet oeffets, se the

4 probablty of observg ay real umber s zero [38]. Suh lmtatos have lead researhers to osder usg a symbol represetato of tme seres. Dsrete Fourer Trasform Peewse Lear Approxmato Haar Wavelet Adaptve Peewse ostat Approxmato Fgure : The most ommo represetatos for tme seres data mg. Eah a be vsualzed as a attempt to approxmate the sgal wth a lear ombato of bass futos Whle there are lterally hudreds of papers o dsretzg symbolzg, tokezg, quatzg tme seres [3, 7] see [5] for a extesve survey, oe of the tehques allows a dstae measure that lower bouds a dstae measure defed o the orgal tme seres. For ths reaso, the geer tme seres data mg approah llustrated Table s of lttle utlty, se the approxmate soluto to problem reated ma memory may be arbtrarly dssmlar to the true soluto that would have bee obtaed o the orgal data. If, however, oe had a symbol approah that allowed lower boudg of the true dstae, oe ould take advatage of the geer tme seres data mg model, ad of a host of other algorthms, deftos ad data strutures whh are oly defed for dsrete data, ludg hashg, Markov models, ad suffx trees. Ths s exatly the otrbuto of ths paper. We all our symbol represetato of tme seres SAX Symbol Aggregate approxmato, ad defe t the ext seto. 3. SAX: Our Symbol Approah SAX allows a tme seres of arbtrary legth to be redued to a strg of arbtrary legth w, w <, typally w <<. The alphabet sze s also a arbtrary teger a, where a >. Table summarzes the major otato used ths ad subsequet setos. Table : A summarzato of the otato used ths paper Ĉ w a A tme seres,, A Peewse Aggregate Approxmato of a tme seres,..., w A symbol represetato of a tme seres ˆ ˆ,..., ˆ w The umber of PAA segmets represetg tme seres Alphabet sze e.g., for the alphabet {a,b,}, a 3 Our dsretzato proedure s uque that t uses a termedate represetato betwee the raw tme seres ad the symbol strgs. We frst trasform the data to the Peewse Aggregate Approxmato PAA represetato ad the symbolze the PAA represetato to a dsrete strg. There are two mportat advatages to dog ths: Dmesoalty Reduto: We a use the well-defed ad well-doumeted dmesoalty reduto power of PAA [30, 63], ad the reduto s automatally arred over to the symbol represetato. Lower Boudg: Provg that a dstae measure betwee two symbol strgs lower bouds the true dstae betwee the orgal tme seres s o-trval. The key observato that

5 allowed us to prove lower bouds s to oetrate o provg that the symbol dstae measure lower bouds the PAA dstae measure. The we a prove the desred result by trastvty by smply potg to the exstg proofs for the PAA represetato tself [3, 63]. We wll brefly revew the PAA tehque before osderg the symbol exteso. 3. Dmesoalty Reduto Va PAA A tme seres of legth a be represeted a w-dmesoal spae by a vetor elemet of s alulated by the followg equato: w w j j + w,,. The th Smply stated, to redue the tme seres from dmesos to w dmesos, the data s dvded to w equal szed frames. The mea value of the data fallg wth a frame s alulated ad a vetor of these values beomes the data-redued represetato. The represetato a be vsualzed as a attempt to approxmate the orgal tme seres wth a lear ombato of box bass futos as show Fgure 3. For smplty ad larty, we assume that s dvsble by w. We wll relax ths assumpto Seto K w Fgure 3: The PAA represetato a be vsualzed as a attempt to model a tme seres wth a lear ombato of box bass futos. I ths ase, a sequee of legth 8 s redued to 8 dmesos The PAA dmesoalty reduto s tutve ad smple, yet has bee show to rval more sophstated dmesoalty reduto tehques lke Fourer trasforms ad wavelets [30, 3, 63]. We ormalze eah tme seres to have mea of zero ad a stadard devato of oe before overtg t to the PAA represetato, se t s well uderstood that t s meagless to ompare tme seres wth dfferet offsets ad ampltudes [3]. 3. Dsretzato Havg trasformed a tme seres database to the PAA we a apply a further trasformato to obta a dsrete represetato. It s desrable to have a dsretzato tehque that wll produe symbols wth equprobablty [5, 45]. Ths s easly aheved se ormalzed tme seres have a Gaussa dstrbuto [38]. To llustrate ths, we extrated subsequees of legth 8 from 8 dfferet tme seres ad plotted ormal probablty plots of the data as show Fgure 4. A ormal probablty plot s a graphal tehque that shows f the data s approxmately ormally dstrbuted []: a approxmate straght le dates that the data s approxmately ormally dstrbuted. As the fgure shows, the hghly lear ature of the plots suggests that the data s approxmately ormal. For a large famly of the tme seres data our dsposal, we ote that the Gaussa assumpto s deed true. For the small subset of data where the assumpto s ot

6 obeyed, the effey s slghtly deterorated; however, the orretess of the algorthm s uaffeted. The orretess of the algorthm s guarateed by the lower-boudg property of the dstae measure the symbol spae, whh we wll expla the ext seto. Gve that the ormalzed tme seres have hghly Gaussa dstrbuto, we a smply determe the breakpots that wll produe a equal-szed areas uder Gaussa urve [38]. Defto. Breakpots: breakpots are a sorted lst of umbers Β β,,β a- suh that the area uder a N0, Gaussa urve from β to β + /a β 0 ad β a are defed as - ad, respetvely. These breakpots may be determed by lookg them up a statstal table. For example, Table 3 gves the breakpots for values of a from 3 to 0. Fgure 4: A ormal probablty plot of the dstrbuto of values from subsequees of legth 8 from 8 dfferet datasets. The hghly lear ature of the plot strogly suggests that the data ame from a Gaussa dstrbuto Table 3: A lookup table that otas the breakpots that dvde a Gaussa dstrbuto a arbtrary umber from 3 to 0 of equprobable regos a β β β β β β β β β β 9.8

7 Oe the breakpots have bee obtaed we a dsretze a tme seres the followg maer. We frst obta a PAA of the tme seres. All PAA oeffets that are below the smallest breakpot are mapped to the symbol a, all oeffets greater tha or equal to the smallest breakpot ad less tha the seod smallest breakpot are mapped to the symbol b, et. Fgure 5 llustrates the dea. b b b b - a a a Fgure 5: A tme seres s dsretzed by frst obtag a PAA approxmato ad the usg predetermed breakpots to map the PAA oeffets to SAX symbols. I the example above, wth 8, w 8 ad a 3, the tme seres s mapped to the word baabb Note that ths example the 3 symbols, a, b ad are approxmately equprobable as we desred. We all the oateato of symbols that represet a subsequee a word. Defto. Word: A subsequee of legth a be represeted as a word ˆ ˆ, K, ˆ as follows. w Let alpha deote the th elemet of the alphabet,.e., alpha a ad alpha b. The the mappg from a PAA approxmato to a word Ĉ s obtaed as follows: ˆ β < β alpha j, f j j We have ow defed our symbol represetato the PAA represetato s merely a termedate step requred to obta the symbol represetato. Reetly, [6] has emprally ad theoretally show some very promsg lusterg results for lppg, that s to say, overtg the tme seres to a bary vetor. They demostrated that dsretzg the tme seres before lusterg sgfatly mproves the auray the presee of outlers. We ote that lppg s atually a speal ase of SAX, where a. 3.3 Dstae Measures Havg trodued the ew represetato of tme seres, we a ow defe a dstae measure o t. By far the most ommo dstae measure for tme seres s the Euldea dstae [3, 5]. Gve two tme seres ad of the same legth, Eq. 3 defes ther Euldea dstae, ad Fgure 6A llustrates a vsual tuto of the measure. D, q 3 If we trasform the orgal subsequees to PAA represetatos, ad, usg Eq., we a the obta a lower boudg approxmato of the Euldea dstae betwee the orgal subsequees by: w q DR, 4 w Ths measure s llustrated Fgure 6B. If we further trasform the data to the symbol represetato, we a defe a MINDIST futo that returs the mmum dstae betwee the orgal tme seres of two words:

8 w dst qˆ, ˆ MINDIST ˆ, ˆ 5 w The futo resembles Eq. 4 exept for the fat that the dstae betwee the two PAA oeffets has bee replaed wth the sub-futo dst. The dst futo a be mplemeted usg a table lookup as llustrated Table 4. Table 4: A lookup table used by the MINDIST futo. Ths table s for a alphabet of ardalty of 4,.e. a4. The dstae betwee two symbols a be read off by examg the orrespodg row ad olum. For example, dsta,b 0 ad dsta, a b d a b d The value ell r, for ay lookup table a be alulated by the followg expresso. 0, f r ell 6 r, βmax r, βm r,, otherwse For a gve value of the alphabet sze a, the table eeds oly be alulated oe, the stored for fast lookup. The MINDIST futo a be vsualzed s Fgure 6..5 A B ˆ ˆ baabb babaa Fgure 6: A vsual tuto of the three represetatos dsussed ths work, ad the dstae measures defed o them. A The Euldea dstae betwee two tme seres a be vsualzed as the square root of the sum of the squared dfferees of eah par of orrespodg pots. B The dstae measure defed for the PAA approxmato a be see as the square root of the sum of the squared dfferees betwee eah par of orrespodg PAA oeffets, multpled by the square root of the ompresso rate. The dstae betwee two SAX represetatos of a tme seres requres lookg up the dstaes betwee eah par of symbols, squarg them, summg them, takg the square root ad fally multplyg by the square root of the ompresso rate

9 As metoed, oe of the most mportat haratersts of SAX s that t provdes a lower-boudg dstae measure. Below, we show that MINDIST lower-bouds the Euldea dstae two steps. Frst, we wll show that the PAA dstae lower-bouds the Euldea dstae. The proof has appeared [3] by the urret author; for ompleteess, we repeat the proof here. Next, we wll show that MINDIST lowerbouds the PAA dstae, whh tur, by trastvty, shows that MINDIST lower-bouds the Euldea dstae. Step : We eed to show that the PAA dstae lower-bouds the Euldea dstae; that s, D, DR,. We wll show the proof o the ase where there s a sgle PAA frame.e. mappg the tme seres to oe sgle PAA oeffet. A more geeralzed proof for N frames a be obtaed by applyg the sgle-frame proof o every frame. Proof: Usg the same otatos as Eq. 3 ad Eq. 4, we wat to prove that w q w q 7 Let ad be the meas of tme seres ad, respetvely. Se we are osderg oly the sgleframe ase, Ieq. 7 a be rewrtte as: q 8 Squarg both sdes we get q 9 Eah pot q a be represeted term of,.e. Thus, Ieq. 9 a be rewrtte as: q q. Same apples to eah pot. q 0 Re-arragg the left-had sde we get q We a expad ad rewrte Ieq. as: q + q By dstrbutve law we get:

10 q q + 3 Or q q + 4 Reall that q q, whh meas that q q, ad smlarty,. Therefore, the summato part of the seod term o the left-had sde of the equalty beomes: q q q q q q Substtutg 0 to the seod term o the left-had sde, Ieq. 4 beomes: 0 q + 5 aellg out o both sdes of the equalty, we get q 0 6 whh always holds true, hee ompletes the proof. Step : otug from Step ad usg the same methodology, we wll ow show that MINDIST lowerbouds the PAA dstae; that s, we wll show that ˆ ˆ, dst 7 Let a, b, ad so forth, there are two possble searos: ase : ˆ ˆ. I other words, the symbols represetg the two tme seres are ether the same, or oseutve from the alphabet, e.g. ' ' ˆ ' ' ˆ ', ' ˆ ˆ b ad a or a. From Eq. 6, we kow that the

11 MINDIST s 0 ths ase. Therefore, the rght-had sde of Ieq. 7 beomes zero, whh makes the equalty always hold true. ase : ˆ ˆ >. I other words, the symbols represetg the two tme seres are at least two alphabets apart, e.g. ˆ ' ' ad ˆ ' a'. For smplty, assume ˆ > ˆ ; the ase where ˆ < ˆ a be prove smlar fasho. Aordg to Eq. 6, dst ˆ, ˆ s dst ˆ, ˆ β β ˆ ˆ 8 For the example above, dst ' ', ' a' β β. Reall that Eq. states the followg: ˆ alpha j, f j β < β j So we kow that β β ˆ ˆ < < β β ˆ ˆ 9 Substtutg Eq. 8 to Ieq. 7 we get βˆ β ˆ 0 whh mples that β ˆ β ˆ Note that from our assumptos earler that ˆ ˆ > ad ˆ > ˆ.e. s at a hgher rego tha, we a drop the absolute value otatos o both sdes: β β ˆ ˆ Rearragg the terms we get: β ˆ β ˆ 3 whh we kow always holds true se, from Ieq. 9, we kow that

12 β 0 ˆ β ˆ < 0 4 Ths ompletes the proof for ˆ > ˆ. The ase where ˆ < ˆ a be prove smlarly, ad s omtted for brevty. There s oe ssue we must address f we are to use a symbol represetato of tme seres. If we wsh to approxmate a massve dataset ma memory, the parameters w ad a have to be hose suh a way that the approxmato makes the best use of the prmary memory avalable. There s a lear tradeoff betwee the parameter w otrollg the umber of approxmatg elemets, ad the value a otrollg the graularty of eah approxmatg elemet. It s ufeasble to determe the best tradeoff aalytally, se t s hghly data depedet. We a however emprally determe the best values wth a smple expermet. Se we wsh to aheve the tghtest possble lower bouds, we a smply estmate the lower bouds over all possble feasble parameters, ad hoose the best settgs. MINDIST ˆ, ˆ Tghtess of Lower Boud 5 D, We performed suh a test wth a oateato of 50 tme seres databases take from the UR tme seres data mg arhve. For every ombato of parameters we averaged the result of 00,000 expermets o subsequees of legth 56. Fgure 7 shows the results. Tghtess of lower boud Word Sze w Fgure 7: The emprally estmated tghtess of lower bouds over the ross produt of a [3 ] ad w [ 9]. The darker hstogram bars llustrate ombatos of parameters that requre approxmately equal spae to store every possble word. The results suggest that usg a low value for a results weak bouds. Whle t s tutve that larger alphabet szes yeld better results, there are dmshg returs as a reases. If spae s a ssue, a alphabet sze the rage 5 to 8 seems to be a good hoe that offers a reasoable balae betwee spae ad tghtess of lower boud eah alphabet wth ths rage a be represeted wth just 3 bts. Ireasg the alphabet sze would requre more bts to represet eah alphabet. We ed ths seto wth a vsual omparso betwee SAX ad the four most used represetatos the lterature Fgure 8. We a see that SAX preserves the geeral shape of the orgal tme seres. Note that se SAX s a symbol represetato, the alphabets a be stored as bts rather tha doubles, whh results a osderable amout of spae-savg. Therefore, SAX represetato a afford to have hgher dmesoalty tha the other real-valued approahes, whle usg less or the same amout of spae Alphabet sze a

13 f e d b a DFT PLA Haar APA Fgure 8: A vsual omparso of SAX ad the four most ommo tme seres data mg represetatos. A raw tme seres of legth 8 s trasformed to the word ffffffeeeddbaabeedbaaaaaddee. Ths s a far omparso se the umber of bts eah represetato s the same 3.4 Numerosty Reduto We have see that, gve a sgle tme seres, our approah a sgfatly redue ts dmesoalty. I addto, our approah a redue the umerosty of the data for some applatos. Most applatos assume that we have oe very log tme seres T, ad that maageable subsequees of legth are extrated by use of a sldg wdow, the stored a matrx for further mapulato [, 0, 30, 63]. Fgure 9 llustrates the dea. T p 67 0 p Fgure 9: A llustrato of the otato trodued ths seto: A tme seres T of legth 8, the subsequee 67, of legth 6, ad the frst 8 subsequees extrated by a sldg wdow. Note that the sldg wdows are overlappg Whe performg sldg wdows subsequee extrato, wth ay of the real-valued represetatos, we must store all T - + extrated subsequees dmesoalty redued form. However, mage for a momet that we are usg our proposed approah. If the frst word extrated s aabb, ad the wdow s shfted to dsover that the seod word s also aabb, we a reasoably dede ot to lude the seod ourree of the word sldg wdows matrx. If we ever eed to retreve all ourrees of aabb, we a go to the loato poted to by the frst ourrees, ad remember to slde to the rght, testg to see f the ext wdow s also mapped to the same word. We a stop testg as soo as the word hages. Ths smple dea s very smlar to the ru-legth-eodg data ompresso algorthm. The utlty of ths optmzato depeds o the parameters used ad the data tself, but t typally yelds a umerosty reduto fator of two or three. However, may datasets are haraterzed by log perods of lttle or o movemet, followed by bursts of atvty sesmologal data s a obvous example. O these datasets the umerosty reduto fator a be huge. osder the example show Fgure 0.

14 Spae Shuttle STS-57 Telemetry aabb aabb Fgure 0: Sldg wdow extrato o Spae Shuttle Telemetry data, wth 3. At tme pot 6, the extrated word s aabb, ad the ext 40 subsequees also map to ths word. Oly a poter to the frst ourree must be reorded, thus produg a large reduto umerosty There s oly oe speal ase we must osder. As we oted Seto 3., we ormalze eah tme seres ludg subsequees to have a mea of zero ad a stadard devato of oe. However, f the subsequee otas oly oe value, the stadard devato s ot defed. More troublesome s the ase where the subsequee s almost ostat, perhaps 3 zeros ad a sgle If we ormalze ths subsequee, the sgle dfferg elemet wll have ts value exploded to Ths stuato ours qute frequetly. For example, the last 00 tme uts of the data Fgure 0 appear to be ostat, but atually ota ty amouts of ose. If we were to ormalze subsequees extrated from ths area, the ormalzato would magfy the ose to large meagless patters. We a easly deal wth ths problem, f the stadard devato of the sequee before ormalzato s below a epslo ε, we smply assg the etre word to the mddle-raged alphabet e.g. f a Relaxato o the Number of Segmets So far we have desrbed SAX wth the assumpto that the legth of the tme seres s dvsble by the umber of segmets,.e. /w must be a teger. If s ot dvdable by w, there wll be some pots the tme seres that we are ot sure whh segmet to put them. For example, Fgure A, we are dvdg 0 data pots to 5 segmets. Ad t s obvous that pot, should be segmet ; pot 3, 4 should be segmet ; so o ad so forth. I Fgure B, we are dvdg 0 data pots to 3 segmets. It s ot lear whh segmet pot 4 should go: segmet or segmet. Same problem holds for pot 7. The assumpto must be dvdable by w learly lmts our hoes of w, ad s problemat f s a prme umber. Here we show that ths eeds ot be the ase ad provde a smple soluto whe s ot dvsble by w. Istead of puttg the whole pot to a segmet, we a put part of t. For example, Fgure B, pot 4 otrbutes ts /3 to segmet ad ts /3 to segmet, ad pot 7 otrbutes ts /3 to segmet ad ts /3 to segmet 3. Ths makes eah segmet otas exatly 3 /3 data pots ad solves the udvdable problem. Ths geeralzato s mplemeted the later verso of SAX, as well as some of the applatos that utlze SAX.

15 A: S S S 3 S 4 S 5 otrbutes to S wth weght /3 B: otrbutes to S wth weght /3 otrbutes to S wth weght /3 otrbutes to S 3 wth weght / S S S 3 Fgure : A 0 data pots are dvded to 5 segmets. B 0 data pots are dvded to 3 segmets. The data pots marked wth rles otrbute to two adjaet segmets at the same tme 4. Expermetal Valdato of Our Symbol Approah I ths seto, we perform varous data mg tasks usg our symbol approah ad ompare the results wth other well-kow exstg approahes. For lusterg, lassfato, ad aomaly deteto, we ompare the results wth the lass Euldea dstae, ad wth other prevously proposed symbol approahes. Note that oe of these other approahes use dmesoalty reduto. I the ext paragraphs we summarze the strawme represetatos that we ompare ours to. We hoose these two approahes se they are typal represetatves of approahes the lterature. Adré-Jösso, ad Badal [3] proposed the SDA algorthm that omputes the hages betwee values from oe stae to the ext, ad dvde the rage to user-predefed setos. The dsadvatages of ths approah are obvous: pror kowledge of the data dstrbuto of the tme seres s requred order to set the breakpots; ad the dsretzed tme seres does ot oserve the geeral shape or dstrbuto of the data values. Huag ad Yu proposed the IMPATS algorthm, whh uses hage rato betwee oe tme pot to the ext tme pot to dsretze the tme seres [7]. The rage of hage ratos are the dvded to equalszed setos ad mapped to symbols. The tme seres s overted to a dsretzed olleto of hage ratos. As wth SAX, the user eeds to defe the ardalty of symbols. 4. lusterg lusterg s oe of the most ommo data mg tasks, beg useful ts ow rght as a exploratory tool, ad also as a sub-route more omplex algorthms [6,, 9]. We osder two lusterg algorthms, oe of herarhal lusterg, ad oe of parttoal lusterg. 4.. Herarhal lusterg omparg herarhal lustergs s a very good way to ompare ad otrast smlarty measures, se a dedrogram of sze N summarzes ON dstae alulatos [3]. The evaluato s typally subjetve, we smply adjudge whh dstae measure appears to reate the most atural groupgs of the data. However, f we kow the data labels advae we a also make objetve statemets of the qualty of the lusterg. I Fgure we lustered e tme seres from the otrol hart dataset, three eah from the dereasg tred, upward shft ad ormal lasses.

16 Euldea SAX IMPATS alphabet8 SDA Fgure : A omparso of the four dstae measures ablty to luster members of the otrol hart dataset. omplete lkage was used as the agglomerato tehque I ths ase we a objetvely state that SAX s superor, se t orretly assgs eah lass to ts ow subtree. Ths s smply a sde effet due to the smoothg effet of dmesoalty reduto. Therefore, t s ot surprsg that SAX a sometmes outperform the smple Euldea dstae, espeally o osy data, or data wth shftg o the tme-axs. Ths fat s demostrated the dedrogram produed by Euldea dstae: the ormal lass, whh otas a lot of ose, s ot lustered orretly. More geerally, we observed that SAX losely mms Euldea dstae o varous datasets. The reasos that SDA ad IMPATS perform poorly, we observe, are that ether symbol represetato s very desrptve of the geeral shape of the tme seres, ad that the lak of dmesoalty reduto a further dstort the results f the data s osy. What SDA does s essetally dffereg the tme seres, ad the dsretzg the resultg seres. Whle dffereg has bee used hstorally statstal tme seres aalyss, ts purposes to remove some autoorrelato, ad to make a tme seres statoary are ot always applable determato of smlarty data mg. I addto, although omputg the dervatves tells the type of hage from oe tme pot to the ext tme pot: sharp rease, slght rease, sharp derease, et., ths approah does t appear very useful se tme seres data are typally osy. More spefally, addto to the overall treds or shapes, there are oses that appear throughout the etre tme seres. Wthout ay smoothg or dmesoalty reduto, these oses are lkely to overshadow the atual haratersts of the tme seres. To demostrate why the dereasg tred ad the upward shft lasses are dstgushable by the lusterg algorthm for SDA, let s look at what the dffereed seres look lke. Fgure 3 shows the orgal tme seres ad ther orrespodg seres after dffereg. It s lear that the dffereed seres from the same lass are ot ay more smlar tha those from a dfferet lass. As a matter of fat, as we ompute the parwse dstaes betwee all 6 dffereed seres, we realze that the dstaes are ot datve at all of the lasses these data belog. Table 5 ad Table 6 show the ter- ad the tra-dstaes betwee the seres the seres from the dereasg tred lass are deoted as A, ad the seres from the upward shft are deoted as B. Iterestgly, [3], the authors show that takg the frst dervatves.e. dffereg atually worses the results whe ompared to usg the raw data. They further show that performg peewse ormalzato.e. ormalzato o fxed-szed wdows rather o the whole seres o the frst dervatves mproves the results. Our expermetal results valdate ther observatos, as SDA does ot do ay kd of ormalzato, whereas peewse ormalzato s a part of SAX the PAA step.

17 IMPATS suffers from smlar problems as SDA. I addto, t s lear that ether IMPATS or SDA a beat smple Euldea dstae, ad the dsusso above apples to all data mg tasks, se the problems le the ature of the represetatos. A B dereasg tred after dffereg upward shft after dffereg Fgure 3: A Tme seres from the dereasg tred lass ad the resultg seres after dffereg. B Tme seres from the upward shft lass ad the resultg seres after dffereg. Table 5: Itra-lass dstaes betwee the dffereed tme seres from the dereasg tred lass. Table 6: Iter-lass dstaes betwee the dffereed tme seres from the dereasg tred ad the upward shft lasses. A A A3 B B B3 A A A A A A Parttoal lusterg Although herarhal lusterg s a good saty hek for ay proposed dstae measure, t has lmted utlty for data mg beause of ts poor salablty. The most ommoly used data mg lusterg algorthm s k-meas [], so for ompleteess we wll osder t here. We performed k-meas o both the orgal raw data, ad our symbol represetato. Fgure 4 shows a typal ru of k-meas o a spae telemetry dataset. Both algorthms overge after teratos. Se k-meas algorthm seeks to optmze the objetve futo, by mmzg the sum of squared tra-luster error, we ompare ad plot the objetve futos for eah terato. The objetve futo for a gve lusterg s gve by Eq. 6, where x s the tme seres, ad m s the luster eter of the luster that x belogs to. The smaller the objetve futo, the more ompat thus better the lusters. F k N m x m The results here are qute ututve ad surprsg: workg wth a approxmato of the data gves better results tha workg wth the orgal data. Fortuately, a reet paper offers a suggesto as to why ths mght be so. It has bee show that talzg the lusters eters o a low dmeso approxmato of the data a mprove the qualty [6], ths s what lusterg wth SAX mpltly does. 6

18 65000 Objetve Futo Raw Raw data data Our Symbol SAX Approah Fgure 4: A omparso of the k-meas lusterg algorthm usg SAX ad usg the raw data. The dataset was Spae Shuttle telemetry,,000 subsequees of legth 5. Surprsgly, workg wth the symbol approxmato produes better results tha workg wth the orgal data I we trodue aother dstae measure based o SAX. By applyg t o lusterg, we show that t outperforms the Euldea dstae measure. 4. lassfato Number of Iteratos lassfato of tme seres has attrated muh terest from the data mg ommuty. Although spealpurpose algorthms have bee proposed [36], we wll osder oly the two most ommo lassfato algorthms for brevty, larty of presetatos ad to faltate depedet ofrmato of our fdgs. 4.. Nearest Neghbor lassfato To ompare dfferet dstae measures o -earest-eghbor lassfato, we use leavg-oe-out ross valdato. Frstly, we ompare SAX wth Euldea dstae, IMPATS, SDA, ad LP f. Two lass sythet datasets are used: the ylder-bell-fuel BF dataset has 50 staes of tme seres for eah of the three lusters, ad the otrol hart has 00 staes for eah of the sx lusters [3]. Se SAX allows dmesoalty ad alphabet sze as user put, ad the IMPATS allows varable alphabet sze, we ra the expermets o dfferet ombatos of dmesoalty reduto ad alphabet sze. For the other approahes we appled the smple dmesoalty reduto tehque of skppg data pots at a fxed terval. I Fgure 5, we show the result wth a dmesoalty reduto of 4 to. Smlar results were observed for other levels of dmesoalty reduto. Oe aga, SAX s ablty to beat Euldea dstae s probably due to the smoothg effet of dmesoalty reduto, evertheless ths expermet does show the superorty of SAX over the others proposed the lterature.

19 0.6 ylder -Bell -Fuel otrol hart 0.5 Error Rate Impats SDA Euldea LP max SAX Alphabet Sze Alphabet Sze Fgure 5: A omparso of fve dstae measures utlty for earest eghbor lassfato. We tested dfferet alphabet szes for SAX ad IMPATS, SDA s alphabet sze s fxed at 5 Se both IMPATS ad SDA perform poorly ompared to Euldea dstae ad SAX, we wll exlude them from the rest of the lassfato expermets. To provde a loser look o how SAX ompares to Euldea dstae, we ra a extesve expermet ad ompared the error rates o datasets avalable ole at Eah dataset s splt to trag ad testg parts. We use the trag part to searh the best value for SAX parameters w umber of SAX words ad a sze of the alphabet: For w, we searh from up to / s the legth of the tme seres. Eah tme we double the value of w. For a, we searh eah value betwee 3 ad 0. If there s a te, we use the smaller values. The ompresso rato last olum of ext table s alulated as: w log a, beause for SAX 3 represetato we oly eed log a bts per word, whle for the orgal tme seres we eed 4 bytes 3 bts for eah value. The we lassfy the testg set based o the trag set usg oe earest eghbor lassfer ad report the error rate. The results are show Table 7. We also summarze the results by plottg the error rates for eah dataset as a -dmesoal pot: EU_error, SAX_error. If a pot falls wth the lower tragle, the SAX s more aurate tha Euldea dstae, ad ve versa for the upper tragle. The plot s show Fgure 6. From ths expermet, we a olude that SAX s ompettve wth Euldea dstae, but requres far less spae.

20 Name Number of lasses Table 7: -NN omparso betwee Euldea Dstae ad SAX. Sze of Trag Set Sze of Testg Set Tme Seres Legth -NN EU Error -NN SAX Error w a ompresso Rato Sythet % otrol Gu-Pot % BF % Fae all % OSU Leaf % Swedsh Leaf % 50Words % Trae % Two Patters % Wafer % Fae four % lghtg % lghtg % EG % Ada % Yoga % Fsh % Plae % ar % Beef % offee % Olve Ol % Error Rate of SAX Represetato I ths rego Euldea dstae s more aurate I ths rego SAX represetato s more aurate Error Rate of Euldea Dstae Fgure 6: Error rates for SAX ad Euldea dstae o datasets. Lower tragle s the rego where SAX s more aurate tha Euldea dstae, ad upper tragle s where Euldea dstae s more aurate tha SAX.

21 4.. Deso Tree lassfato Beause of Nearest Neghbor s poor salablty, t s usutable for most data mg applatos; stead deso trees are the most ommo hoe of lassfer. Whle deso trees are defed for real data, attemptg to lassfy tme seres usg the raw data would learly be a mstake, se the hgh dmesoalty ad ose levels would result a deep, bushy tree wth poor auray. I a attempt to overome ths problem, Geurts [4] suggests represetg the tme seres as a Regresso Tree RT ths represetato s essetally the same as APA [30], see Fgure, ad trag the deso tree dretly o ths represetato. The tehque shows great promse. We ompared SAX to the Regresso Tree RT o two datasets; the results are Table 8. Table 8: A omparso of SAX wth the spealzed Regresso Tree approah for deso tree lassfato. Our approah used a alphabet sze of 6, both approahes used a dmesoalty of 8 Dataset SAX Regresso Tree 3.04 ± ±. BF 0.97 ±.4.4 ±.0 Note that whle our results are ompettve wth the RT approah, the RT represetato s udoubtedly superor terms of terpretablty [4]. Oe aga, our pot s smply that our blak box approah a be ompettve wth spealzed solutos. 4.3 uery by otet Idexg The majorty of work o tme seres data mg appearg the lterature has addressed the problem of dexg tme seres for fast retreval [53]. Ideed, t s ths otext that most of the represetatos eumerated Fgure were trodued [, 0, 30, 63]. Dozes of papers have trodued tehques to do dexg wth a symbol approah [3, 7], but wthout exepto, the aswer set retreved by these tehques a be very dfferet to the aswer set that would be retreved by the true Euldea dstae. It s oly by usg a lower boudg tehque that oe a guaratee retrevg the full aswer set, wth o false dsmssals [0]. To perform query by otet, we buld a dex usg SAX, ad ompare t to a dex bult usg the Haar wavelet approah []. Se the datasets we use are large ad dsk-resdet, ad the redued dmesoalty ould stll be potetally hgh or at least hgh eough suh that the performae degeerates to sequetal sa f R-tree were used [6], we use Vetor Approxmato VA fle as our dexg algorthm. We ote, however, that SAX ould also be dexed by lass strg dexg tehques suh as suffx trees. To ompare performae, we measure the peretage of dsk I/Os requred order to retreve the oeearest eghbor to a radomly extrated query, relatve to the umber of dsk I/Os requred for sequetal sa. Se t has bee forbly show that the hoe of dataset a make a sgfat dfferee the relatve dexg ablty of a represetato, we tested o more tha 50 datasets from the UR Tme Seres Data Mg Arhve. I Fgure 7 we show 4 represetatve examples. The y-axs shows the dex power terms of the peretage of the data retreved from the dsk, ompared to sequetal sa. I almost all ases, SAX shows a superor reduto the umber of dsk aesses. I addto, SAX does ot have the lmtato faed by the Haar Wavelet that the data legth must be a power of two.

22 DWT Haar SAX Ballbeam haot Memory Wdg Dataset Fgure 7: A omparso of dexg ablty of wavelets versus SAX. The Y-axs s the peretage of the data that must be retreved from dsk to aswer a -NN query of legth 56, whe the dmesoalty reduto rato s 3 to for both approahes 4.4 Takg Advatage of the Dsrete Nature of our Represetato I the prevous setos we showed examples of how our proposed represetato a ompete wth realvalued represetatos ad the orgal data. I ths seto we llustrate examples of data mg algorthms that take explt advatage of the dsrete ature of our represetato Detetg Novel/Surprsg/Aomalous Behavor A smple dea for detetg aomalous behavor tme seres s to exame prevously observed ormal data ad buld a model of t. Data obtaed the future a be ompared to ths model ad ay lak of oformty a sgal a aomaly [4]. I order to aheve ths, [34] we ombed a statstally soud sheme wth a effet ombatoral approah. The statstally sheme s based o Markov has ad ormalzato. Markov has are used to model the ormal behavor, whh s ferred from the prevously observed data. The tme- ad spae-effey of the algorthm omes from the use of suffx tree as the ma data struture. Eah ode of the suffx tree represets a patter. The tree s aotated wth a sore obtaed omparg the support of a patter observed the ew data wth the support reorded the Markov model. Ths apparetly smple strategy turs out to be very effetve dsoverg surprsg patters. I the orgal work we use a smple symbol approah, smlar to IMPATS [7]; here we revst the work usg SAX. For ompleteess, we wll ompare SAX to two hghly refereed aomaly deteto algorthms that are defed o real valued represetatos, the TSA-tree Wavelet based approah of Shahab et al. [54] ad the Immuology IMM spred work of Dasgupta ad Forrest [4]. We also lude the Markov tehque usg IMPATS ad SDA order to dsover how muh of the dfferee a be attrbuted dretly to the represetato. Fgure 8 otas a expermet omparg all 5 tehques.

23 5 I II III IIII V VI VII Fgure 8: A omparso of fve aomaly deteto algorthms o the same task. I The trag data, a slghtly osy se wave of legth,000. II The tme seres to be examed for aomales s a osy se wave that was reated wth the same parameters as the trag sequee, the a assortmet of aomales were trodued at tme perods 50, 500 ad 750. III ad IIII The Markov Model tehque usg the IMPATS ad SDA represetatos dd ot learly dsover the aomales, ad reported some false alarms. V The IMM aomaly deteto algorthm appears to have dsovered the frst aomaly, but t also reported may false alarms. VI The TSA-Tree approah s uable to detet the aomales. VII The Markov model-based tehque usg SAX learly fds the aomales, wth o false alarms The results o ths smple expermet are mpressve. Se suffx trees ad Markov models a be used oly o dsrete data, ths offers a motvato for our symbol approah. Whle all the other approahes, ludg the Markov Models usg IMPATS ad SDA represetatos, the Immuology-based aomaly deteto approah, ad the TSA-Tree approah, dd ot learly dsover the aomales ad reported some false alarms, the SAX-based Markov Model learly fds the aomales wth o false alarms Motf Dsovery It s well uderstood boformats that overrepreseted DNA sequees ofte have bologal sgfae [5, 9, 5]. A substatal body of lterature has bee devoted to tehques to dsover suh patters [5, 57, 60]. I a prevous work, we defed the related oept of tme seres motf [43]. Tme seres motfs are lose aalogues of ther dsrete ouss, although the deftos must be augmeted to prevet erta degeerate solutos. The aïve algorthm to dsover the motfs s quadrat the legth of the tme seres. I [43], we demostrated a smple tehque to mtgate the quadrat omplexty by a large ostat fator, evertheless ths tme omplexty s learly uteable for most real datasets. The symbol ature of SAX offers a uque opportuty to aval of the wealth of boformats researh ths area. I partular, reet work by Tompa ad Buhler holds great promse [60]. The authors show that may prevously usolvable motf dsovery problems a be solved by hashg subsequees to bukets usg a radom subset of ther features as a key, the dog some post-proessg searh o the hash bukets 3. They all ther algorthm PROJETION. We arefully remplemeted the radom projeto algorthm of Tompa ad Buhler, makg mor hages the post-proessg step to allow for the fat that although we are hashg radom projetos of our symbol represetato, we atually wsh to dsover motfs defed o the orgal raw data [3]. Fgure 9 shows a example of a motf dsovered a dustral dataset [8] usg ths tehque. The patters foud are extremely smlar to oe aother. 3 Of ourse, ths desrpto greatly uderstates the otrbutos of ths work. We urge the reader to osult the orgal paper.

24 Wdg Dataset Agular speed of reel A B A B Fgure 9: Above, a motf dsovered a omplex dataset by the modfed PROJETION algorthm. Below, the motf s best vsualzed by algg the two subsequees ad zoomg. The smlarty of the two subsequees s strkg, ad hts at uexpeted regularty Apart from the attratve salablty of the algorthm, there s aother mportat advatage over other approahes. The PROJETION algorthm s able to dsover motfs eve the presee of ose. Our exteso of the algorthm herts ths robustess to ose. We dret terested readers to [3] for more detaled dsusso of ths algorthm Vsualzato Data vsualzato tehques are very mportat for data aalyss, se the huma eye has bee frequetly advoated as the ultmate data-mg tool. However, despte ther llustratve ature, whh a provde users better uderstadg of the data ad tutve terpretato of the mg results, there has bee surprsgly lttle work o vsualzg large tme seres datasets. Oe reaso for ths lak of terest s that tme seres data are also usually very massve sze. Wth lmted pxel spae ad the typally eormous amout of data at had, t s feasble to dsplay all the data o the sree at oe, muh less fdg ay useful formato from the data. How to effetly orgaze the data ad preset them suh a way that s tutve ad omprehesble to huma eyes thus remas a great hallege. Ideally, the vsualzato tehque should follow the Vsual Iformato Seekg Matras, as summarzed by Dr. Be Shederma: Overvew, zoom & flter, detals-o-demad. I other words, t should be able to provde users the overvew or summary of the data, ad allows users to further vestgate o the terestg patters hghlghted by the tool. To ths ed, we developed VzTree [4], a tme seres patter dsovery ad vsualzato system based o augmetg suffx trees. VzTree vsually summarzes both the global ad loal strutures of tme seres data at the same tme. I addto, t provdes ovel teratve solutos to may patter dsovery problems, ludg the dsovery of frequetly ourrg patters motf dsovery, surprsg patters aomaly deteto, ad query by otet. The user teratve paradgm allows users to vsually explore the tme seres, ad perform real-tme hypotheses testg. Se the use of suffx tree requres that the put data be dsrete, SAX s the perfet addate for dsretzg the tme seres data. ompared to the exstg tme seres vsualzato systems the lterature, VzTree s uque several respets. Frst, almost all other approahes assume hghly perod tme seres, whereas VzTree makes o suh assumpto. Other methods typally requre spae both memory spae, ad pxel spae that grows at least learly wth the legth of the tme seres, makg them uteable for mg massve datasets. Fally, VzTree allows us to vsualze a muh rher set of features, ludg global summares of the dfferees betwee two tme seres, loally repeated patters, aomales, et. I VzTree, patters are represeted a depth-lmted tree struture, whh ther frequees of ourree are eoded the thkesses of brahes. The algorthm works by sldg a wdow aross the tme seres ad extratg subsequees of user-defed legths. The subsequees are the dsretzed to strgs by SAX ad serted to a augmeted suffx tree. Eah strg s regarded as a patter, ad the frequey of ourree for eah patter s eoded by the thkess of the brah: the thker the brah, the

25 more frequet the orrespodg patter. Motf dsovery ad aomaly deteto a thus be easly aheved: those that our frequetly a be regarded as motfs, ad those that our rarely a be regarded as aomaly. Fgure 0 shows the sreeshot of VzTree for aomaly deteto o the Duth power demad dataset. Eletrty osumpto s reorded every 5 mutes; therefore, for the year of 997, there are 35,040 data pots. The majorty of the weeks follow the regular Moday-Frday, 5-workg-day patter, as show by the thk brahes. The th brahes deote the aomales the sese that the eletrty osumpto s abormal gve the day of the week. Note that VzTree, we reverse the alphabet orderg so the alphabets ow read top-dow rather tha bottom-up e.g. a s ow the topmost brah, rather tha the bottommost brah. Ths way, the strg better desrbes the atual shape of the tme seres a deotes the top rego, b the mddle rego, the bottom rego. The top rght wdow shows the subtree whe we lk o the d hld of the root ode. lkg o ay of the exstg brahes the ma or the subtree wdow wll plot the subsequees represeted by them the bottom rght wdow. The hghlghted, rled subsequee s retreved by lkg o the brah bab. The zoom- shows why t s a aomaly: t s the begg of the three-day week durg hrstmas Thursday ad Frday off. The other th brahes deote other aomales suh as New Year s Day, Good Frday, uee s Brthday, et. Zoom a b a b Fgure 0: Aomaly deteto o power osumpto data. The aomaly show here s a short week durg hrstmas. The evaluato for vsualzato tehques s usually subjetve. Although VzTree learly demostrates ts apablty detetg o-trval patters, [40] we also devse a measure that quatfes the effetveess of the algorthm. The measure, whh we all the dssmlarty oeffet, desrbes how dssmlar two tme seres are, ad rages from 0 to. I essee, the oeffet summarzes the dfferee behavor of eah patter represeted by a strg two tme seres. More oretely, for eah patter, we out ts respetve umbers of ourrees both tme seres, ad see how muh the frequees dffer. We all ths measure support, whh s the weghted by the ofdee, or the degree of terestgess of the patter. For example, a patter that ours 0 tmes tme seres A ad 00 tmes tme seres B s

26 probably less sgfat tha a patter that ours 0 tmes A but zero tmes B, eve though the support for both ases s 0. Subtratg the dssmlarty oeffet from the gves us a ovel smlarty measure that desrbes how smlar two tme seres are. More detals o the dssmlarty measure a be foud [40]. A mportat fat about ths smlarty measure s that, ulke a dstae measure that omputes pot-to-pot dstaes, t aptures the global struture of the tme seres rather tha loal dfferees. Ths tme-varat feature s useful f we are terested the overall strutures of the tme seres. Fgure shows the dedrogram of lusterg result usg the dssmlarty oeffet as the dstae measure. It learly demostrates that the oeffet aptures the dssmlarty very well ad that all lusters are separated perfetly. Note that t s eve able to dstgush the four dfferet sets of heartbeats from top dow, luster, 4, 5, ad 6! Fgure : lusterg result usg the dssmlarty oeffet As a referee, we ra the same lusterg algorthm usg the wdely-used Euldea dstae. The result s show Fgure. learly, lusterg usg our dssmlarty measure returs superor results. Fgure : lusterg result usg Euldea dstae

A Fast Algorithm for Computing the Deceptive Degree of an Objective Function

A Fast Algorithm for Computing the Deceptive Degree of an Objective Function IJCSNS Iteratoal Joural of Computer See ad Networ Seurty, VOL6 No3B, Marh 6 A Fast Algorthm for Computg the Deeptve Degree of a Objetve Futo LI Yu-qag Eletro Tehque Isttute, Zhegzhou Iformato Egeerg Uversty,

More information

Fuzzy Risk Evaluation Method for Information Technology Service

Fuzzy Risk Evaluation Method for Information Technology Service Fuzzy Rsk Evaluato Method for Iformato Tehology Serve Outsourg Qasheg Zhag Yrog Huag Fuzzy Rsk Evaluato Method for Iformato Tehology Serve Outsourg 1 Qasheg Zhag 2 Yrog Huag 1 Shool of Iformats Guagdog

More information

Universal Prediction Applied to Stylistic Music Generation Gיrard Assayag (Ircam), Shlomo Dubnov (Ben Gurion Univ.)

Universal Prediction Applied to Stylistic Music Generation Gיrard Assayag (Ircam), Shlomo Dubnov (Ben Gurion Univ.) Uversal Predto Appled to Stylst Mus Geerato Gיrard Assayag (Iram), Shlomo Dubov (Be Guro Uv.) Abstrat Capturg a style of a partular pee or a omposer s ot a easy task. Several attempts to use mahe learg

More information

A Hierarchical Fuzzy Linear Regression Model for Forecasting Agriculture Energy Demand: A Case Study of Iran

A Hierarchical Fuzzy Linear Regression Model for Forecasting Agriculture Energy Demand: A Case Study of Iran 3rd Iteratoal Coferee o Iformato ad Faal Egeerg IPEDR vol. ( ( IACSIT Press, Sgapore A Herarhal Fuzz Lear Regresso Model for Foreastg Agrulture Eerg Demad: A Case Stud of Ira A. Kazem, H. Shakour.G, M.B.

More information

Preprocess a planar map S. Given a query point p, report the face of S containing p. Goal: O(n)-size data structure that enables O(log n) query time.

Preprocess a planar map S. Given a query point p, report the face of S containing p. Goal: O(n)-size data structure that enables O(log n) query time. Computatoal Geometry Chapter 6 Pot Locato 1 Problem Defto Preprocess a plaar map S. Gve a query pot p, report the face of S cotag p. S Goal: O()-sze data structure that eables O(log ) query tme. C p E

More information

6.7 Network analysis. 6.7.1 Introduction. References - Network analysis. Topological analysis

6.7 Network analysis. 6.7.1 Introduction. References - Network analysis. Topological analysis 6.7 Network aalyss Le data that explctly store topologcal formato are called etwork data. Besdes spatal operatos, several methods of spatal aalyss are applcable to etwork data. Fgure: Network data Refereces

More information

Average Price Ratios

Average Price Ratios Average Prce Ratos Morgstar Methodology Paper August 3, 2005 2005 Morgstar, Ic. All rghts reserved. The formato ths documet s the property of Morgstar, Ic. Reproducto or trascrpto by ay meas, whole or

More information

IDENTIFICATION OF THE DYNAMICS OF THE GOOGLE S RANKING ALGORITHM. A. Khaki Sedigh, Mehdi Roudaki

IDENTIFICATION OF THE DYNAMICS OF THE GOOGLE S RANKING ALGORITHM. A. Khaki Sedigh, Mehdi Roudaki IDENIFICAION OF HE DYNAMICS OF HE GOOGLE S RANKING ALGORIHM A. Khak Sedgh, Mehd Roudak Cotrol Dvso, Departmet of Electrcal Egeerg, K.N.oos Uversty of echology P. O. Box: 16315-1355, ehra, Ira sedgh@eetd.ktu.ac.r,

More information

Hi-Tech Authentication for Palette Images Using Digital Signature and Data Hiding

Hi-Tech Authentication for Palette Images Using Digital Signature and Data Hiding The Iteratoal Arab Joural of Iformato Tehology, Vol. 8, No., Aprl 0 7 H-Teh Authetato for Palette Images Usg Dgtal Sgature ad Data Hdg Aroka Jasra, Regasvaguruatha Rajesh, Ramasamy Balasubramaa, ad Perumal

More information

Spatial Keyframing for Performance-driven Animation

Spatial Keyframing for Performance-driven Animation Eurographs/ACSIGGRAPH Symposum o Computer Amato (25) K. Ajyo, P. Faloutsos (Edtors) Spatal Keyframg for Performae-drve Amato T. Igarash,3, T. osovh 2, ad J. F. Hughes 2 The Uversty of Tokyo 2 Brow Uversty

More information

Chapter Eight. f : R R

Chapter Eight. f : R R Chapter Eght f : R R 8. Itroducto We shall ow tur our atteto to the very mportat specal case of fuctos that are real, or scalar, valued. These are sometmes called scalar felds. I the very, but mportat,

More information

ANOVA Notes Page 1. Analysis of Variance for a One-Way Classification of Data

ANOVA Notes Page 1. Analysis of Variance for a One-Way Classification of Data ANOVA Notes Page Aalss of Varace for a Oe-Wa Classfcato of Data Cosder a sgle factor or treatmet doe at levels (e, there are,, 3, dfferet varatos o the prescrbed treatmet) Wth a gve treatmet level there

More information

STATISTICAL PROPERTIES OF LEAST SQUARES ESTIMATORS. x, where. = y - ˆ " 1

STATISTICAL PROPERTIES OF LEAST SQUARES ESTIMATORS. x, where. = y - ˆ  1 STATISTICAL PROPERTIES OF LEAST SQUARES ESTIMATORS Recall Assumpto E(Y x) η 0 + η x (lear codtoal mea fucto) Data (x, y ), (x 2, y 2 ),, (x, y ) Least squares estmator ˆ E (Y x) ˆ " 0 + ˆ " x, where ˆ

More information

Numerical Methods with MS Excel

Numerical Methods with MS Excel TMME, vol4, o.1, p.84 Numercal Methods wth MS Excel M. El-Gebely & B. Yushau 1 Departmet of Mathematcal Sceces Kg Fahd Uversty of Petroleum & Merals. Dhahra, Saud Araba. Abstract: I ths ote we show how

More information

Locally Adaptive Dimensionality Reduction for Indexing Large Time Series Databases

Locally Adaptive Dimensionality Reduction for Indexing Large Time Series Databases Locally Adaptve Dmesoalty educto for Idexg Large Tme Seres Databases Kaushk Chakrabart Eamo Keogh Sharad Mehrotra Mchael Pazza Mcrosoft esearch Uv. of Calfora Uv. of Calfora Uv. of Calfora edmod, WA 985

More information

Improving website performance for search engine optimization by using a new hybrid MCDM model

Improving website performance for search engine optimization by using a new hybrid MCDM model Improvg webste performae for searh ege optmzato by usg a ew hybrd MDM model Ye-hag he Isttute of ha ad Asa-Paf Studes, Natoal Su Yat-se Uversty, awa, R.O.. tayler530259@gmal.om Yu-Sheg Lu Departmet of

More information

The Gompertz-Makeham distribution. Fredrik Norström. Supervisor: Yuri Belyaev

The Gompertz-Makeham distribution. Fredrik Norström. Supervisor: Yuri Belyaev The Gompertz-Makeham dstrbuto by Fredrk Norström Master s thess Mathematcal Statstcs, Umeå Uversty, 997 Supervsor: Yur Belyaev Abstract Ths work s about the Gompertz-Makeham dstrbuto. The dstrbuto has

More information

Statistical Pattern Recognition (CE-725) Department of Computer Engineering Sharif University of Technology

Statistical Pattern Recognition (CE-725) Department of Computer Engineering Sharif University of Technology I The Name of God, The Compassoate, The ercful Name: Problems' eys Studet ID#:. Statstcal Patter Recogto (CE-725) Departmet of Computer Egeerg Sharf Uversty of Techology Fal Exam Soluto - Sprg 202 (50

More information

1. The Time Value of Money

1. The Time Value of Money Corporate Face [00-0345]. The Tme Value of Moey. Compoudg ad Dscoutg Captalzato (compoudg, fdg future values) s a process of movg a value forward tme. It yelds the future value gve the relevat compoudg

More information

Applications of Support Vector Machine Based on Boolean Kernel to Spam Filtering

Applications of Support Vector Machine Based on Boolean Kernel to Spam Filtering Moder Appled Scece October, 2009 Applcatos of Support Vector Mache Based o Boolea Kerel to Spam Flterg Shugag Lu & Keb Cu School of Computer scece ad techology, North Cha Electrc Power Uversty Hebe 071003,

More information

Abraham Zaks. Technion I.I.T. Haifa ISRAEL. and. University of Haifa, Haifa ISRAEL. Abstract

Abraham Zaks. Technion I.I.T. Haifa ISRAEL. and. University of Haifa, Haifa ISRAEL. Abstract Preset Value of Autes Uder Radom Rates of Iterest By Abraham Zas Techo I.I.T. Hafa ISRAEL ad Uversty of Hafa, Hafa ISRAEL Abstract Some attempts were made to evaluate the future value (FV) of the expected

More information

T = 1/freq, T = 2/freq, T = i/freq, T = n (number of cash flows = freq n) are :

T = 1/freq, T = 2/freq, T = i/freq, T = n (number of cash flows = freq n) are : Bullets bods Let s descrbe frst a fxed rate bod wthout amortzg a more geeral way : Let s ote : C the aual fxed rate t s a percetage N the otoal freq ( 2 4 ) the umber of coupo per year R the redempto of

More information

A Comparison of the Performance of Two-Tier Cellular Networks Based on Queuing Handoff Calls

A Comparison of the Performance of Two-Tier Cellular Networks Based on Queuing Handoff Calls Iteratoal Joural of Appled Matemats ad Computer Sees 2;2 www.waset.org Sprg 2006 A Comparso of te erformae of Two-Ter Cellular Networks Based o Queug Hadoff Calls Tara Sal ad Kemal Fdaboylu Abstrat Two-ter

More information

The analysis of annuities relies on the formula for geometric sums: r k = rn+1 1 r 1. (2.1) k=0

The analysis of annuities relies on the formula for geometric sums: r k = rn+1 1 r 1. (2.1) k=0 Chapter 2 Autes ad loas A auty s a sequece of paymets wth fxed frequecy. The term auty orgally referred to aual paymets (hece the ame), but t s ow also used for paymets wth ay frequecy. Autes appear may

More information

The Digital Signature Scheme MQQ-SIG

The Digital Signature Scheme MQQ-SIG The Dgtal Sgature Scheme MQQ-SIG Itellectual Property Statemet ad Techcal Descrpto Frst publshed: 10 October 2010, Last update: 20 December 2010 Dalo Glgorosk 1 ad Rue Stesmo Ødegård 2 ad Rue Erled Jese

More information

CHAPTER 2. Time Value of Money 6-1

CHAPTER 2. Time Value of Money 6-1 CHAPTER 2 Tme Value of Moey 6- Tme Value of Moey (TVM) Tme Les Future value & Preset value Rates of retur Autes & Perpetutes Ueve cash Flow Streams Amortzato 6-2 Tme les 0 2 3 % CF 0 CF CF 2 CF 3 Show

More information

APPENDIX III THE ENVELOPE PROPERTY

APPENDIX III THE ENVELOPE PROPERTY Apped III APPENDIX III THE ENVELOPE PROPERTY Optmzato mposes a very strog structure o the problem cosdered Ths s the reaso why eoclasscal ecoomcs whch assumes optmzg behavour has bee the most successful

More information

Speeding up k-means Clustering by Bootstrap Averaging

Speeding up k-means Clustering by Bootstrap Averaging Speedg up -meas Clusterg by Bootstrap Averagg Ia Davdso ad Ashw Satyaarayaa Computer Scece Dept, SUNY Albay, NY, USA,. {davdso, ashw}@cs.albay.edu Abstract K-meas clusterg s oe of the most popular clusterg

More information

MDM 4U PRACTICE EXAMINATION

MDM 4U PRACTICE EXAMINATION MDM 4U RCTICE EXMINTION Ths s a ractce eam. It does ot cover all the materal ths course ad should ot be the oly revew that you do rearato for your fal eam. Your eam may cota questos that do ot aear o ths

More information

ECONOMIC CHOICE OF OPTIMUM FEEDER CABLE CONSIDERING RISK ANALYSIS. University of Brasilia (UnB) and The Brazilian Regulatory Agency (ANEEL), Brazil

ECONOMIC CHOICE OF OPTIMUM FEEDER CABLE CONSIDERING RISK ANALYSIS. University of Brasilia (UnB) and The Brazilian Regulatory Agency (ANEEL), Brazil ECONOMIC CHOICE OF OPTIMUM FEEDER CABE CONSIDERING RISK ANAYSIS I Camargo, F Fgueredo, M De Olvera Uversty of Brasla (UB) ad The Brazla Regulatory Agecy (ANEE), Brazl The choce of the approprate cable

More information

of the relationship between time and the value of money.

of the relationship between time and the value of money. TIME AND THE VALUE OF MONEY Most agrbusess maagers are famlar wth the terms compoudg, dscoutg, auty, ad captalzato. That s, most agrbusess maagers have a tutve uderstadg that each term mples some relatoshp

More information

Checking Out the Doght Stadard Odors in Polygamy

Checking Out the Doght Stadard Odors in Polygamy Cosstey Test o Mass Calbrato of Set of Weghts Class ad Lowers Lus Oar Beerra, Igao Herádez, Jorge Nava, Fél Pezet Natoal Ceter of Metrology (CNAM) Querétaro, Meo Abstrat: O weghts albrato oe by oe there

More information

STRATEGIC SUPPLY FUNCTION COMPETITION WITH PRIVATE INFORMATION. Xavier Vives. October 2009 COWLES FOUNDATION DISCUSSION PAPER NO.

STRATEGIC SUPPLY FUNCTION COMPETITION WITH PRIVATE INFORMATION. Xavier Vives. October 2009 COWLES FOUNDATION DISCUSSION PAPER NO. STRATEGIC SUPPLY FUNCTION COMPETITION WITH PRIVATE INFORMATION By Xaver Vves Otober 009 COWLES FOUNDATION DISCUSSION PAPER NO. 1736 COWLES FOUNDATION FOR RESEARCH IN ECONOMICS YALE UNIVERSITY Box 0881

More information

SHAPIRO-WILK TEST FOR NORMALITY WITH KNOWN MEAN

SHAPIRO-WILK TEST FOR NORMALITY WITH KNOWN MEAN SHAPIRO-WILK TEST FOR NORMALITY WITH KNOWN MEAN Wojcech Zelńsk Departmet of Ecoometrcs ad Statstcs Warsaw Uversty of Lfe Sceces Nowoursyowska 66, -787 Warszawa e-mal: wojtekzelsk@statystykafo Zofa Hausz,

More information

RUSSIAN ROULETTE AND PARTICLE SPLITTING

RUSSIAN ROULETTE AND PARTICLE SPLITTING RUSSAN ROULETTE AND PARTCLE SPLTTNG M. Ragheb 3/7/203 NTRODUCTON To stuatos are ecoutered partcle trasport smulatos:. a multplyg medum, a partcle such as a eutro a cosmc ray partcle or a photo may geerate

More information

10.5 Future Value and Present Value of a General Annuity Due

10.5 Future Value and Present Value of a General Annuity Due Chapter 10 Autes 371 5. Thomas leases a car worth $4,000 at.99% compouded mothly. He agrees to make 36 lease paymets of $330 each at the begg of every moth. What s the buyout prce (resdual value of the

More information

Optimal multi-degree reduction of Bézier curves with constraints of endpoints continuity

Optimal multi-degree reduction of Bézier curves with constraints of endpoints continuity Computer Aded Geometrc Desg 19 (2002 365 377 wwwelsevercom/locate/comad Optmal mult-degree reducto of Bézer curves wth costrats of edpots cotuty Guo-Dog Che, Guo-J Wag State Key Laboratory of CAD&CG, Isttute

More information

Relaxation Methods for Iterative Solution to Linear Systems of Equations

Relaxation Methods for Iterative Solution to Linear Systems of Equations Relaxato Methods for Iteratve Soluto to Lear Systems of Equatos Gerald Recktewald Portlad State Uversty Mechacal Egeerg Departmet gerry@me.pdx.edu Prmary Topcs Basc Cocepts Statoary Methods a.k.a. Relaxato

More information

Models for Selecting an ERP System with Intuitionistic Trapezoidal Fuzzy Information

Models for Selecting an ERP System with Intuitionistic Trapezoidal Fuzzy Information JOURNAL OF SOFWARE, VOL 5, NO 3, MARCH 00 75 Models for Selectg a ERP System wth Itutostc rapezodal Fuzzy Iformato Guwu We, Ru L Departmet of Ecoomcs ad Maagemet, Chogqg Uversty of Arts ad Sceces, Yogchua,

More information

DECISION MAKING WITH THE OWA OPERATOR IN SPORT MANAGEMENT

DECISION MAKING WITH THE OWA OPERATOR IN SPORT MANAGEMENT ESTYLF08, Cuecas Meras (Meres - Lagreo), 7-9 de Septembre de 2008 DECISION MAKING WITH THE OWA OPERATOR IN SPORT MANAGEMENT José M. Mergó Aa M. Gl-Lafuete Departmet of Busess Admstrato, Uversty of Barceloa

More information

Session 4: Descriptive statistics and exporting Stata results

Session 4: Descriptive statistics and exporting Stata results Itrduct t Stata Jrd Muñz (UAB) Sess 4: Descrptve statstcs ad exprtg Stata results I ths sess we are gg t wrk wth descrptve statstcs Stata. Frst, we preset a shrt trduct t the very basc statstcal ctets

More information

On Error Detection with Block Codes

On Error Detection with Block Codes BULGARIAN ACADEMY OF SCIENCES CYBERNETICS AND INFORMATION TECHNOLOGIES Volume 9, No 3 Sofa 2009 O Error Detecto wth Block Codes Rostza Doduekova Chalmers Uversty of Techology ad the Uversty of Gotheburg,

More information

Simple Linear Regression

Simple Linear Regression Smple Lear Regresso Regresso equato a equato that descrbes the average relatoshp betwee a respose (depedet) ad a eplaator (depedet) varable. 6 8 Slope-tercept equato for a le m b (,6) slope. (,) 6 6 8

More information

Classic Problems at a Glance using the TVM Solver

Classic Problems at a Glance using the TVM Solver C H A P T E R 2 Classc Problems at a Glace usg the TVM Solver The table below llustrates the most commo types of classc face problems. The formulas are gve for each calculato. A bref troducto to usg the

More information

Chapter 7 Dynamics. 7.1 Newton-Euler Formulation of Equations of Motion

Chapter 7 Dynamics. 7.1 Newton-Euler Formulation of Equations of Motion Itroduto to Robots,. arry Asada Chapter 7 Dyams I ths hapter, we aalyze the dyam behavor of robot mehasms. he dyam behavor s desrbed terms of the tme rate of hage of the robot ofgurato relato to the ot

More information

Settlement Prediction by Spatial-temporal Random Process

Settlement Prediction by Spatial-temporal Random Process Safety, Relablty ad Rs of Structures, Ifrastructures ad Egeerg Systems Furuta, Fragopol & Shozua (eds Taylor & Fracs Group, Lodo, ISBN 978---77- Settlemet Predcto by Spatal-temporal Radom Process P. Rugbaapha

More information

Fractal-Structured Karatsuba`s Algorithm for Binary Field Multiplication: FK

Fractal-Structured Karatsuba`s Algorithm for Binary Field Multiplication: FK Fractal-Structured Karatsuba`s Algorthm for Bary Feld Multplcato: FK *The authors are worg at the Isttute of Mathematcs The Academy of Sceces of DPR Korea. **Address : U Jog dstrct Kwahadog Number Pyogyag

More information

Chapter 3. AMORTIZATION OF LOAN. SINKING FUNDS R =

Chapter 3. AMORTIZATION OF LOAN. SINKING FUNDS R = Chapter 3. AMORTIZATION OF LOAN. SINKING FUNDS Objectves of the Topc: Beg able to formalse ad solve practcal ad mathematcal problems, whch the subjects of loa amortsato ad maagemet of cumulatve fuds are

More information

An Approach to Evaluating the Computer Network Security with Hesitant Fuzzy Information

An Approach to Evaluating the Computer Network Security with Hesitant Fuzzy Information A Approach to Evaluatg the Computer Network Securty wth Hestat Fuzzy Iformato Jafeg Dog A Approach to Evaluatg the Computer Network Securty wth Hestat Fuzzy Iformato Jafeg Dog, Frst ad Correspodg Author

More information

Opinion Extraction, Summarization and Tracking in News and Blog Corpora

Opinion Extraction, Summarization and Tracking in News and Blog Corpora Opo Extrato, Suarzato ad Trakg ews ad Blog Corpora Lu-We Ku, Yu-Tg Lag ad Hs-Hs Che Departet of Coputer See ad Iforato Egeerg atoal Tawa Uversty Tape, Tawa {lwku, eaga}@lg.se.tu.edu.tw; hhhe@se.tu.edu.tw

More information

Online Appendix: Measured Aggregate Gains from International Trade

Online Appendix: Measured Aggregate Gains from International Trade Ole Appedx: Measured Aggregate Gas from Iteratoal Trade Arel Burste UCLA ad NBER Javer Cravo Uversty of Mchga March 3, 2014 I ths ole appedx we derve addtoal results dscussed the paper. I the frst secto,

More information

Compressive Sensing over Strongly Connected Digraph and Its Application in Traffic Monitoring

Compressive Sensing over Strongly Connected Digraph and Its Application in Traffic Monitoring Compressve Sesg over Strogly Coected Dgraph ad Its Applcato Traffc Motorg Xao Q, Yogca Wag, Yuexua Wag, Lwe Xu Isttute for Iterdscplary Iformato Sceces, Tsghua Uversty, Bejg, Cha {qxao3, kyo.c}@gmal.com,

More information

Single machine stochastic appointment sequencing and scheduling

Single machine stochastic appointment sequencing and scheduling Sgle mahe stohast aotmet sequeg ad shedulg We develo algorthms for a sgle mahe stohast aotmet sequeg ad shedulg roblem th atg tme, dle tme, ad overtme osts. Ths s a bas stohast shedulg roblem that has

More information

The simple linear Regression Model

The simple linear Regression Model The smple lear Regresso Model Correlato coeffcet s o-parametrc ad just dcates that two varables are assocated wth oe aother, but t does ot gve a deas of the kd of relatoshp. Regresso models help vestgatg

More information

Robust Realtime Face Recognition And Tracking System

Robust Realtime Face Recognition And Tracking System JCS& Vol. 9 No. October 9 Robust Realtme Face Recogto Ad rackg System Ka Che,Le Ju Zhao East Cha Uversty of Scece ad echology Emal:asa85@hotmal.com Abstract here s some very mportat meag the study of realtme

More information

Fast, Secure Encryption for Indexing in a Column-Oriented DBMS

Fast, Secure Encryption for Indexing in a Column-Oriented DBMS Fast, Secure Ecrypto for Idexg a Colum-Oreted DBMS Tgja Ge, Sta Zdok Brow Uversty {tge, sbz}@cs.brow.edu Abstract Networked formato systems requre strog securty guaratees because of the ew threats that

More information

Plastic Number: Construction and Applications

Plastic Number: Construction and Applications Scet f c 0 Advaced Advaced Scetfc 0 December,.. 0 Plastc Number: Costructo ad Applcatos Lua Marohć Polytechc of Zagreb, 0000 Zagreb, Croata lua.marohc@tvz.hr Thaa Strmeč Polytechc of Zagreb, 0000 Zagreb,

More information

ADAPTATION OF SHAPIRO-WILK TEST TO THE CASE OF KNOWN MEAN

ADAPTATION OF SHAPIRO-WILK TEST TO THE CASE OF KNOWN MEAN Colloquum Bometrcum 4 ADAPTATION OF SHAPIRO-WILK TEST TO THE CASE OF KNOWN MEAN Zofa Hausz, Joaa Tarasńska Departmet of Appled Mathematcs ad Computer Scece Uversty of Lfe Sceces Lubl Akademcka 3, -95 Lubl

More information

RQM: A new rate-based active queue management algorithm

RQM: A new rate-based active queue management algorithm : A ew rate-based actve queue maagemet algorthm Jeff Edmods, Suprakash Datta, Patrck Dymod, Kashf Al Computer Scece ad Egeerg Departmet, York Uversty, Toroto, Caada Abstract I ths paper, we propose a ew

More information

How To Value An Annuity

How To Value An Annuity Future Value of a Auty After payg all your blls, you have $200 left each payday (at the ed of each moth) that you wll put to savgs order to save up a dow paymet for a house. If you vest ths moey at 5%

More information

Green Master based on MapReduce Cluster

Green Master based on MapReduce Cluster Gree Master based o MapReduce Cluster Mg-Zh Wu, Yu-Chag L, We-Tsog Lee, Yu-Su L, Fog-Hao Lu Dept of Electrcal Egeerg Tamkag Uversty, Tawa, ROC Dept of Electrcal Egeerg Tamkag Uversty, Tawa, ROC Dept of

More information

OBJECT TRACKING AND POSITIONING ON VIDEO IMAGES

OBJECT TRACKING AND POSITIONING ON VIDEO IMAGES OBJC RACKIG AD OIIOIG O VIDO IMAG Ch-Far Che, M- Che Ceter for pae ad Remote eg Reearh, atoal Cetral verty, Chug L, AIWA fhe@rr.u.edu.tw 55 Commo ICWG V/III KY WORD: Vdeo, arget, rakg, Objet, Mathg ABRAC:

More information

Statistical Intrusion Detector with Instance-Based Learning

Statistical Intrusion Detector with Instance-Based Learning Iformatca 5 (00) xxx yyy Statstcal Itruso Detector wth Istace-Based Learg Iva Verdo, Boja Nova Faulteta za eletroteho raualštvo Uverza v Marboru Smetaova 7, 000 Marbor, Sloveja va.verdo@sol.et eywords:

More information

CSSE463: Image Recognition Day 27

CSSE463: Image Recognition Day 27 CSSE463: Image Recogto Da 27 Ths week Toda: Alcatos of PCA Suda ght: roject las ad relm work due Questos? Prcal Comoets Aalss weght grth c ( )( ) ( )( ( )( ) ) heght sze Gve a set of samles, fd the drecto(s)

More information

Security Analysis of RAPP: An RFID Authentication Protocol based on Permutation

Security Analysis of RAPP: An RFID Authentication Protocol based on Permutation Securty Aalyss of RAPP: A RFID Authetcato Protocol based o Permutato Wag Shao-hu,,, Ha Zhje,, Lu Sujua,, Che Da-we, {College of Computer, Najg Uversty of Posts ad Telecommucatos, Najg 004, Cha Jagsu Hgh

More information

Optimal replacement and overhaul decisions with imperfect maintenance and warranty contracts

Optimal replacement and overhaul decisions with imperfect maintenance and warranty contracts Optmal replacemet ad overhaul decsos wth mperfect mateace ad warraty cotracts R. Pascual Departmet of Mechacal Egeerg, Uversdad de Chle, Caslla 2777, Satago, Chle Phoe: +56-2-6784591 Fax:+56-2-689657 rpascual@g.uchle.cl

More information

Stochastic Programming Models for International Asset Allocation Problems

Stochastic Programming Models for International Asset Allocation Problems Stohast Programmg Models or teratoal Asset Alloato Problems Herules Vladmrou Nolas Topaloglou, Stavros Zeos HERMES eter o omputatoal Fae & Eooms Shool o Eooms & Maagemet Uversty o yprus RsLab Meetg Madrd,

More information

Network dimensioning for elastic traffic based on flow-level QoS

Network dimensioning for elastic traffic based on flow-level QoS Network dmesog for elastc traffc based o flow-level QoS 1(10) Network dmesog for elastc traffc based o flow-level QoS Pas Lassla ad Jorma Vrtamo Networkg Laboratory Helsk Uversty of Techology Itroducto

More information

Conversion of Non-Linear Strength Envelopes into Generalized Hoek-Brown Envelopes

Conversion of Non-Linear Strength Envelopes into Generalized Hoek-Brown Envelopes Covero of No-Lear Stregth Evelope to Geeralzed Hoek-Brow Evelope Itroducto The power curve crtero commoly ued lmt-equlbrum lope tablty aaly to defe a o-lear tregth evelope (relatohp betwee hear tre, τ,

More information

The Analysis of Development of Insurance Contract Premiums of General Liability Insurance in the Business Insurance Risk

The Analysis of Development of Insurance Contract Premiums of General Liability Insurance in the Business Insurance Risk The Aalyss of Developmet of Isurace Cotract Premums of Geeral Lablty Isurace the Busess Isurace Rsk the Frame of the Czech Isurace Market 1998 011 Scetfc Coferece Jue, 10. - 14. 013 Pavla Kubová Departmet

More information

n. We know that the sum of squares of p independent standard normal variables has a chi square distribution with p degrees of freedom.

n. We know that the sum of squares of p independent standard normal variables has a chi square distribution with p degrees of freedom. UMEÅ UNIVERSITET Matematsk-statstska sttutoe Multvarat dataaalys för tekologer MSTB0 PA TENTAMEN 004-0-9 LÖSNINGSFÖRSLAG TILL TENTAMEN I MATEMATISK STATISTIK Multvarat dataaalys för tekologer B, 5 poäg.

More information

Constrained Cubic Spline Interpolation for Chemical Engineering Applications

Constrained Cubic Spline Interpolation for Chemical Engineering Applications Costraed Cubc Sple Iterpolato or Chemcal Egeerg Applcatos b CJC Kruger Summar Cubc sple terpolato s a useul techque to terpolate betwee kow data pots due to ts stable ad smooth characterstcs. Uortuatel

More information

Integrating Production Scheduling and Maintenance: Practical Implications

Integrating Production Scheduling and Maintenance: Practical Implications Proceedgs of the 2012 Iteratoal Coferece o Idustral Egeerg ad Operatos Maagemet Istabul, Turkey, uly 3 6, 2012 Itegratg Producto Schedulg ad Mateace: Practcal Implcatos Lath A. Hadd ad Umar M. Al-Turk

More information

Projection model for Computer Network Security Evaluation with interval-valued intuitionistic fuzzy information. Qingxiang Li

Projection model for Computer Network Security Evaluation with interval-valued intuitionistic fuzzy information. Qingxiang Li Iteratoal Joural of Scece Vol No7 05 ISSN: 83-4890 Proecto model for Computer Network Securty Evaluato wth terval-valued tutostc fuzzy formato Qgxag L School of Software Egeerg Chogqg Uversty of rts ad

More information

CIS603 - Artificial Intelligence. Logistic regression. (some material adopted from notes by M. Hauskrecht) CIS603 - AI. Supervised learning

CIS603 - Artificial Intelligence. Logistic regression. (some material adopted from notes by M. Hauskrecht) CIS603 - AI. Supervised learning CIS63 - Artfcal Itellgece Logstc regresso Vasleos Megalookoomou some materal adopted from otes b M. Hauskrecht Supervsed learg Data: D { d d.. d} a set of eamples d < > s put vector ad s desred output

More information

Performance Attribution. Methodology Overview

Performance Attribution. Methodology Overview erformace Attrbuto Methodology Overvew Faba SUAREZ March 2004 erformace Attrbuto Methodology 1.1 Itroducto erformace Attrbuto s a set of techques that performace aalysts use to expla why a portfolo's performace

More information

Dynamic Two-phase Truncated Rayleigh Model for Release Date Prediction of Software

Dynamic Two-phase Truncated Rayleigh Model for Release Date Prediction of Software J. Software Egeerg & Applcatos 3 63-69 do:.436/jsea..367 Publshed Ole Jue (http://www.scrp.org/joural/jsea) Dyamc Two-phase Trucated Raylegh Model for Release Date Predcto of Software Lafe Qa Qgchua Yao

More information

Chapter 3 0.06 = 3000 ( 1.015 ( 1 ) Present Value of an Annuity. Section 4 Present Value of an Annuity; Amortization

Chapter 3 0.06 = 3000 ( 1.015 ( 1 ) Present Value of an Annuity. Section 4 Present Value of an Annuity; Amortization Chapter 3 Mathematcs of Face Secto 4 Preset Value of a Auty; Amortzato Preset Value of a Auty I ths secto, we wll address the problem of determg the amout that should be deposted to a accout ow at a gve

More information

We present a new approach to pricing American-style derivatives that is applicable to any Markovian setting

We present a new approach to pricing American-style derivatives that is applicable to any Markovian setting MANAGEMENT SCIENCE Vol. 52, No., Jauary 26, pp. 95 ss 25-99 ess 526-55 6 52 95 forms do.287/msc.5.447 26 INFORMS Prcg Amerca-Style Dervatves wth Europea Call Optos Scott B. Laprse BAE Systems, Advaced

More information

Three Dimensional Interpolation of Video Signals

Three Dimensional Interpolation of Video Signals Three Dmesoal Iterpolato of Vdeo Sgals Elham Shahfard March 0 th 006 Outle A Bref reve of prevous tals Dgtal Iterpolato Bascs Upsamplg D Flter Desg Issues Ifte Impulse Respose Fte Impulse Respose Desged

More information

STATISTICAL ANALYSIS OF WIND SPEED DATA

STATISTICAL ANALYSIS OF WIND SPEED DATA Esşehr Osmagaz Üerstes Müh.Mm.Fa.Dergs C. XVIII, S.2, 2005 Eg.&Arh.Fa. Esşehr Osmagaz Uersty, Vol. XVIII, No: 2, 2005 STATISTICAL ANALYSIS OF WIND SPEED DATA Veysel YILMAZ, Haydar ARAS 2, H.Eray ÇELİK

More information

A New Bayesian Network Method for Computing Bottom Event's Structural Importance Degree using Jointree

A New Bayesian Network Method for Computing Bottom Event's Structural Importance Degree using Jointree , pp.277-288 http://dx.do.org/10.14257/juesst.2015.8.1.25 A New Bayesa Network Method for Computg Bottom Evet's Structural Importace Degree usg Jotree Wag Yao ad Su Q School of Aeroautcs, Northwester Polytechcal

More information

An IG-RS-SVM classifier for analyzing reviews of E-commerce product

An IG-RS-SVM classifier for analyzing reviews of E-commerce product Iteratoal Coferece o Iformato Techology ad Maagemet Iovato (ICITMI 205) A IG-RS-SVM classfer for aalyzg revews of E-commerce product Jaju Ye a, Hua Re b ad Hagxa Zhou c * College of Iformato Egeerg, Cha

More information

Report 52 Fixed Maturity EUR Industrial Bond Funds

Report 52 Fixed Maturity EUR Industrial Bond Funds Rep52, Computed & Prted: 17/06/2015 11:53 Report 52 Fxed Maturty EUR Idustral Bod Fuds From Dec 2008 to Dec 2014 31/12/2008 31 December 1999 31/12/2014 Bechmark Noe Defto of the frm ad geeral formato:

More information

ISyE 512 Chapter 7. Control Charts for Attributes. Instructor: Prof. Kaibo Liu. Department of Industrial and Systems Engineering UW-Madison

ISyE 512 Chapter 7. Control Charts for Attributes. Instructor: Prof. Kaibo Liu. Department of Industrial and Systems Engineering UW-Madison ISyE 512 Chapter 7 Cotrol Charts for Attrbutes Istructor: Prof. Kabo Lu Departmet of Idustral ad Systems Egeerg UW-Madso Emal: klu8@wsc.edu Offce: Room 3017 (Mechacal Egeerg Buldg) 1 Lst of Topcs Chapter

More information

An Effectiveness of Integrated Portfolio in Bancassurance

An Effectiveness of Integrated Portfolio in Bancassurance A Effectveess of Itegrated Portfolo Bacassurace Taea Karya Research Ceter for Facal Egeerg Isttute of Ecoomc Research Kyoto versty Sayouu Kyoto 606-850 Japa arya@eryoto-uacp Itroducto As s well ow the

More information

Reinsurance and the distribution of term insurance claims

Reinsurance and the distribution of term insurance claims Resurace ad the dstrbuto of term surace clams By Rchard Bruyel FIAA, FNZSA Preseted to the NZ Socety of Actuares Coferece Queestow - November 006 1 1 Itroducto Ths paper vestgates the effect of resurace

More information

Maintenance Scheduling of Distribution System with Optimal Economy and Reliability

Maintenance Scheduling of Distribution System with Optimal Economy and Reliability Egeerg, 203, 5, 4-8 http://dx.do.org/0.4236/eg.203.59b003 Publshed Ole September 203 (http://www.scrp.org/joural/eg) Mateace Schedulg of Dstrbuto System wth Optmal Ecoomy ad Relablty Syua Hog, Hafeg L,

More information

where p is the centroid of the neighbors of p. Consider the eigenvector problem

where p is the centroid of the neighbors of p. Consider the eigenvector problem Vrtual avgato of teror structures by ldar Yogja X a, Xaolg L a, Ye Dua a, Norbert Maerz b a Uversty of Mssour at Columba b Mssour Uversty of Scece ad Techology ABSTRACT I ths project, we propose to develop

More information

IP Network Topology Link Prediction Based on Improved Local Information Similarity Algorithm

IP Network Topology Link Prediction Based on Improved Local Information Similarity Algorithm Iteratoal Joural of Grd Dstrbuto Computg, pp.141-150 http://dx.do.org/10.14257/jgdc.2015.8.6.14 IP Network Topology Lk Predcto Based o Improved Local Iformato mlarty Algorthm Che Yu* 1, 2 ad Dua Zhem 1

More information

Common p-belief: The General Case

Common p-belief: The General Case GAMES AND ECONOMIC BEHAVIOR 8, 738 997 ARTICLE NO. GA97053 Commo p-belef: The Geeral Case Atsush Kaj* ad Stephe Morrs Departmet of Ecoomcs, Uersty of Pesylaa Receved February, 995 We develop belef operators

More information

Finito: A Faster, Permutable Incremental Gradient Method for Big Data Problems

Finito: A Faster, Permutable Incremental Gradient Method for Big Data Problems Fto: A Faster, Permutable Icremetal Gradet Method for Bg Data Problems Aaro J Defazo Tbéro S Caetao Just Domke NICTA ad Australa Natoal Uversty AARONDEFAZIO@ANUEDUAU TIBERIOCAETANO@NICTACOMAU JUSTINDOMKE@NICTACOMAU

More information

Near Neighbor Distribution in Sets of Fractal Nature

Near Neighbor Distribution in Sets of Fractal Nature Iteratoal Joural of Computer Iformato Systems ad Idustral Maagemet Applcatos. ISS 250-7988 Volume 5 (202) 3 pp. 59-66 MIR Labs, www.mrlabs.et/jcsm/dex.html ear eghbor Dstrbuto Sets of Fractal ature Marcel

More information

CHAPTER 13. Simple Linear Regression LEARNING OBJECTIVES. USING STATISTICS @ Sunflowers Apparel

CHAPTER 13. Simple Linear Regression LEARNING OBJECTIVES. USING STATISTICS @ Sunflowers Apparel CHAPTER 3 Smple Lear Regresso USING STATISTICS @ Suflowers Apparel 3 TYPES OF REGRESSION MODELS 3 DETERMINING THE SIMPLE LINEAR REGRESSION EQUATION The Least-Squares Method Vsual Exploratos: Explorg Smple

More information

Proceedings of the 2010 Winter Simulation Conference B. Johansson, S. Jain, J. Montoya-Torres, J. Hugan, and E. Yücesan, eds.

Proceedings of the 2010 Winter Simulation Conference B. Johansson, S. Jain, J. Montoya-Torres, J. Hugan, and E. Yücesan, eds. Proceedgs of the 21 Wter Smulato Coferece B. Johasso, S. Ja, J. Motoya-Torres, J. Huga, ad E. Yücesa, eds. EMPIRICAL METHODS OR TWO-ECHELON INVENTORY MANAGEMENT WITH SERVICE LEVEL CONSTRAINTS BASED ON

More information

The Time Value of Money

The Time Value of Money The Tme Value of Moey 1 Iversemet Optos Year: 1624 Property Traded: Mahatta Islad Prce : $24.00, FV of $24 @ 6%: FV = $24 (1+0.06) 388 = $158.08 bllo Opto 1 0 1 2 3 4 5 t ($519.37) 0 0 0 0 $1,000 Opto

More information

Capacitated Production Planning and Inventory Control when Demand is Unpredictable for Most Items: The No B/C Strategy

Capacitated Production Planning and Inventory Control when Demand is Unpredictable for Most Items: The No B/C Strategy SCHOOL OF OPERATIONS RESEARCH AND INDUSTRIAL ENGINEERING COLLEGE OF ENGINEERING CORNELL UNIVERSITY ITHACA, NY 4853-380 TECHNICAL REPORT Jue 200 Capactated Producto Plag ad Ivetory Cotrol whe Demad s Upredctable

More information

We investigate a simple adaptive approach to optimizing seat protection levels in airline

We investigate a simple adaptive approach to optimizing seat protection levels in airline Reveue Maagemet Wthout Forecastg or Optmzato: A Adaptve Algorthm for Determg Arle Seat Protecto Levels Garrett va Ryz Jeff McGll Graduate School of Busess, Columba Uversty, New York, New York 10027 School

More information

An Evaluation of Naïve Bayesian Anti-Spam Filtering Techniques

An Evaluation of Naïve Bayesian Anti-Spam Filtering Techniques Proceedgs of the 2007 IEEE Workshop o Iformato Assurace Uted tates Mltary Academy, West Pot, Y 20-22 Jue 2007 A Evaluato of aïve Bayesa At-pam Flterg Techques Vkas P. Deshpade, Robert F. Erbacher, ad Chrs

More information

A Study of Unrelated Parallel-Machine Scheduling with Deteriorating Maintenance Activities to Minimize the Total Completion Time

A Study of Unrelated Parallel-Machine Scheduling with Deteriorating Maintenance Activities to Minimize the Total Completion Time Joural of Na Ka, Vol. 0, No., pp.5-9 (20) 5 A Study of Urelated Parallel-Mache Schedulg wth Deteroratg Mateace Actvtes to Mze the Total Copleto Te Suh-Jeq Yag, Ja-Yuar Guo, Hs-Tao Lee Departet of Idustral

More information