MINING TIME SERIES DATA BASED UPON CLOUD MODEL Hehua Chi a, Juebo Wu b, *, Shuliag Wag a, b, Liahua Chi c, Meg Fag a a Iteratioal School of Software, Wuha Uiversity, Wuha 430079, Chia - hehua556@63.com b State Key Laboratory of Iformatio Egieerig i Surveyig, Mappig ad Remote Sesig, Wuha Uiversity, Wuha 430079, Chia - wujuebo@gmail.com c School of Computer Sciece ad Techology, Huazhog Uiversity of Sciece ad Techology, Wuha 430074, Chia liahua_chi@63.com KEY WORDS: Spatial Data Miig, Time-series, Model, Predictio ABSTRACT: I recet years may attempts have bee made to idex, cluster, classify ad mie predictio rules from icreasig massive sources of spatial time-series. I this paper, a ovel approach of miig time-series is proposed based o cloud model, which described by umerical characteristics. Firstly, the cloud model theory is itroduced ito the time series miig. Time-series ca be described by the three umerical characteristics as their features: expectatio, etropy ad hyper-etropy. Secodly, the features of time-series ca be geerated through the backward cloud geerator ad regarded as time-series umerical characteristics based o cloud model. I accordace with such umerical characteristics as sample sets, the predictio rules are obtaied by curve fittig. Thirdly, the model of miig time-series is preseted, maily icludig the umerical characteristics ad predictio rule miig. Lastly, a case study is carried out for the predictio of satellite image. The results show that the model is feasible ad ca be easily applied to other forecastig.. INTRODUCTION With the rapid developmet of spatial iformatio techology, especially spatial acquisitio techology, spatial base has become the basis of may applicatios. Through the spatial miig or kowledge discovery, we ca obtai the geeral kowledge of geometry from spatial bases, icludig spatial distributio, spatial associatio rules, spatial clusterig rules ad spatial evolutio rules, which ca provide a powerful weapo for makig full use of spatial resources [, 2]. As i the real world, most of the spatial is associated with the time; more ad more researchers have started to pay attetio to the time series miig. Time series model miig plays a importat role i miig. Time series forecastig has always bee a hot issue i may scietific fields. The study of time series icludes the followig importat aspects: tred aalysis, similarity search, sequetial patter miig ad cycle patter miig of timerelated, time series predictio ad so o. G. Box et al. divided the sequece ito several sub-sequeces through a movig widow, ad the discovered characteristic chage patter followig the way of associatio rules by makig use of clusterig to classify these sub-sequeces as a specific patter of chage [3]. J. Ha et al. used the miig techology to study cycle fragmets ad part of cycle fragmets of the time series i the time series base, i order to discover the cyclical patter of time series [4]. I the research [5], the authors proposed cyclic associatio rule miig. The literature [6] proposed caledar associatio rule miig. R. Agrawal et al. had give a series of sub-sequece matchig criterio [7]. Two sequeces were cosidered similar whe existig a sufficiet umber of o-overlappig ad similar sub-sequeces timig right betwee them. Sice the literature [8] published the first research paper about piecewise liear fittig algorithm for sequece, relevat research has received extesive attetio [9, 0, ]. This simple ad ituitive liear fittig represetatio used a series of head ad tail attached liear approximatio to represet time series. I combiatio with cloud model, W. H. Cui et al. proposed a ew method of image segmetatio based o cloud model theory to add ucertaity of image to the segmetatio algorithm [2]. X. Y. Tag et al. preseted a cloud mappig space based o gradatio by the cloud theory aimig to solve the problem of lad use classificatio of RS image [3]. K. Qi proposed a ovel way for weather classificatio based o cloud model ad hierarchical clusterig [4]. I applicatios, time series aalysis has bee widely used i various fields of society, such as: macro-cotrol of the atioal ecoomy, regioal itegrated developmet plaig, busiess maagemet, market potetial predictio, weather forecastig, hydrological forecastig, earthquake precursors predictio, evirometal pollutio cotrol, ecological balace, marie survey ad so o. The time series miig research has received some progress, but there are still some shortcomigs. For example: time series should be smooth ad ormal distributio. This paper presets a predictio model based o cloud model. This model describes the features of time series through three umerical characteristics. The time series, such as images, are stadardized as the cloud droplets after the pre-processig, ad the are executed cloud trasform through backward cloud geerator to get three umerical characteristics. Similarly, we ca calculate umber of sample sets to obtai a series of cloud umerical characteristics. Through curve fittig, we ca mie the predictio rules to achieve predictio. * Correspodig author. wujuebo@gmail.com. 62
2. The cloud model 2. THE CLOUD MODEL model is proposed by D. Y. Li i 996 [5]. As the ucertaity kowledge of qualitative ad quatitative coversio of the mathematical model, the fuzziess ad radomess fully itegrated together, costitutig a qualitative ad quatitative mappig. Defiitio: Let U be a uiversal set described by precise umbers, ad C be the qualitative cocept related to U. Assume that there is a umber x U that radomly stads for the cocept C ad the certaity degree of x for C, which is a radom value with stabilizatio tedecy ad meets: μ : U [0,] x U x μ(x) where the distributio of x o U is defied as cloud lad. Each x is defied as a cloud drop. Figure shows the umerical characteristics {Ex E He} of the cloud. ad iterdepedet relatios. The algorithm of backward cloud geerator is as follows: Iput: Samples x i ad the certaity degree C T ( x i ) (i=, 2, N) Output: the qualitative cocept (Ex, E, He) Steps: () Calculate the average value of x i, i.e.,, the X = x i i= first-order absolute cetral of the samples M Ι = x i X, i= sample variace 2 2 S = ( x i X ) i= (2) Ex = X (3) π E = M Ι 2 2 2 (4) He = S E 3. TIME-SERIES MINING MODEL BASED ON CLOUD MODEL Usig cloud model theory, this sectio proposes the framework of time-series miig, ad gives cocrete steps. The timeseries ca be described by the three umerical characteristics of cloud model. 3. Time-series miig framework based o cloud model The mai idea of time-series miig framework based o cloud model is as follows: Figure. The umerical characteristics {Ex E He} I the cloud model, we employ the expected value Ex, the etropy E, ad the hyper-etropy He to represet the cocept as a whole. The expected value Ex: The mathematical expectatio of the cloud drop distributed i the uiversal set. The etropy E: The ucertaity measuremet of the qualitative cocept. It is determied by both the radomess ad the fuzziess of the cocept. The hyper-etropy He: It is the ucertaity measuremet of the etropy, i.e., the secod-order etropy of the etropy, which is determied by both the radomess ad fuzziess of the etropy. 2.2 Backward cloud geerator Backward cloud geerator is ucertaity coversio model which realizes the radom coversio betwee the umerical value ad the laguage value at ay time as the mappig from quatitative to qualitative. It effectively coverts a certai umber of precise to a appropriate qualitative laguage value (Ex, E, He). Accordig to that, we ca get the whole of the cloud droplets. The more the umber of the accurate is, the more precise the cocept will be. Through the forward ad reverse cloud geerator, cloud model establishes the iterrelated Firstly, extract the experimetal from the time-series bases ad pre-process the to obtai cloud droplets. Secodly, make use of backward cloud geerator algorithm to extract the umerical characteristics {Ex E He} of the cloud droplet. Thirdly, extract the umerical characteristics {Ex E He} of the each cloud droplet ad get all groups of the umerical characteristics {Ex E He}. Fially, make use of the fittig algorithm to do the fittig of all groups of the umerical characteristics {Ex E He}, i order to achieve forecast accordig to the curve. The framework of time-series miig ad the mai flow based o cloud model are show i Figure 2, where gives the specific descriptio of this framework. 3.2 Data pre-processig The mai purpose of pre-processig is to elimiate irrelevat iformatio ad to recovery useful iformatio. The time-series from time-series base is rough, ad it is ecessary to pre-process i order to carry out the ext step. The object of cloud model is cloud droplets. We have to tur the time-series ito cloud droplets as the iput of the cloud model. 63
curve-fittig algorithm i the specific applicatio. For example, by usig the least square method. The (Ex E He) of time-series is effectively fitted to achieve the predictio rule miig. Accordig to causality betwee predictio objects ad factors, we ca achieve the predictio. There are may factors associatig with the target. We must choose the factors havig a strog causal relatioship to achieve the predictio. There are two factors variable X ad the depedet variable Y to express the predictio target. If they are the liear relatioship betwee X ad Y, the the relatioship is Y=a+bX. But if they are the oliear relatioship betwee X ad Y, the the relatioship maybe is Y=a+bX+b2X2+... or others. The a ad b i the formula are the ageda coefficiets. They ca be valued by the statistical or other methods. The time-series for short term predictio is more effective. However, if it is used for log-term predictio, it must also be combied with other methods. After gettig the predictio curves, we ca carry out the further work, amely, to predict future treds of the time-series. Figure 2. The framework of time-series miig based o cloud model 3.3 Numerical Characteristic Extractio Through the umerical characteristic extractio of the timeseries based o cloud model, we ca obtai the umerical characteristics sets of time-series. These characteristics are direct descriptios of the time-series o behalf of the iteral iformatio. Time-series ca come from timeseries base ad ca also chage over time. We ca also make use of backward cloud geerator to extract umerical characteristics. The algorithm is referred as sectio 2.2. The umerical characteristics {Ex E He} of the cloud reflect the quatitative characteristics of the qualitative cocept. Ex (Expected value): The poit value, which is the most qualitative cocept i the domai space, reflects the cloud gravity cetre of the cocept. E (Etropy): Etropy is used to measure the fuzziess ad probability of the qualitative cocept, reflectig the ucertaity of qualitative cocept. The etropy of time-series ca reflect the size rage of the cloud droplet i the domai space. He (Hyper etropy): The ucertaity of the etropy-etropy's etropy reflects the cohesio of all the cloud droplets i the domai space. The value of the hyper etropy idirectly expresses the dispersio of the cloud. By the umerical characteristic extractio of the time-series based o cloud mode, we ca get the umerical characteristics of a large umber of cloud droplets. These umerical characteristics extracted from the time-series describe the overall features, as well as the tred of developmet ad chage over time. The ext step is the curve fittig of the umerical characteristics to obtai predictio rules. 3.4 Predictio Rule Miig The process is maily to idetify the predictio curves. We regard the umerical characteristic extractio of the time-series based o cloud model as the sample set of a curve fittig, ad get the fittig curve by the curve-fittig method. Due to the differeces of the time-series, we should choose the proper 3.5 Predictio The time series is a chroological series of observatios. By the previous step, we ca get oe or more fittig curves. These curves are as predictable rules. By these predictio rules, we ca carry out time series predictio. Accordig to the kowledge of the fuctio ad time parameters, we ca obtai relatioship value of fittig curve fuctio. Makig use of fuctio relatioship, we ca predict ad cotrol the problem uder the give coditios i order to provide for decisiomakig ad maagemet. After gettig mathematical model of log-term treds, seasoal chages ad irregular chages by historical of time series, we ca use them to predict the value T of log-term treds, the value S of seasoal chages ad the value I of irregular chages i the possible case. Ad the we calculate predicted value Y of the future time series by usig these followig models: Additio mode Multiplicative model T + S + I = Y T S I = Y If it is hard to get predicted value of irregular chages, we ca obtai the predicted value of log-term treds ad seasoal chages by puttig the multiplier or sum of the above two as predicted value of time series. If the itself have ot seasoal chages or ot eed to forecast quarterly sub-moth, the predicted value of log-term treds is the predicted value of time series, that is, T = Y. I such way, by predictig the rules curves, we ca predict the time series to discover the laws of developmet of thigs ad make a good decisiomakig for us. 4. A CASE STUDY I order to verify the validity of the time-series miig based o cloud model, we make a experimet about the satellite images i this sectio. The experimet makes use of the real-time satellite cloud image from the Chiese meteorological. Through the aalysis of historical, we ca obtai the predictio rules to carry out the predictio of weather treds. 4. Data acquisitio I this study, sources come from the satellite cloud of Chiese meteorological. The website: http://www.mc.gov.c/. These are real-time dyamic. The satellite makes the realtime photography, ad the returs the to earth. We choose 64
the three kids of as the source, Chia ad the Wester North Pacific Sea Ifrared, Chia Regioal Vapour, Chia Regioal Ifrared. These have the followig characteristics: Time-series property, sychroizatio property ad cosistet view property. We collect a total of 6 moths of the historical to make the experimets. After collectio, we select the 600 images of Chia ad the Wester North Pacific Sea Ifrared, 800 images of Chia Regioal Vapour ad 2700 images of Chia Regioal Vapour, as the sample sets of the experimets. Part of the sample sets are show i Figure 3, Figure 4 ad Figure 5. correspodig to the value of the pixel matrix, that is, cloud droplets. 4.3 Image Feature Extractio This step is primarily to do satellite imagery feature extractio ad get a collectio of three umerical characteristics based o cloud model features. Iput are the pixel values obtaied from the pre-processig step. Each pixel value correspods to the backward cloud geerator amog the cloud droplets. The implemetatio process ca refer to Sectio 2.2, ad the mai fuctio is to achieve the followig: Calculate expectatio value: ImageExp=SelectAveImage(Images, Num); Calculate etropy: ImageE=CalculateStdImage(Images, ImageExp, Num); Figure 3: Chia ad the Wester North Pacific Sea Ifrared Calculate hyper-etropy: ImageHe=CalculateReVariaceImage(, ImageE); The sets of cloud features of satellite images ca be geerated i this step, which ca be as the source for predictio curve fittig. Figure 4: Chia Regioal Vapour Figure 5: Chia Regioal Ifrared 4.4 Curve Fittig Predictio Through the backward cloud geerator, we have a sample set of the umerical characteristics based o cloud model. We ca get the curve fittig by the sample sets. I this study, we use the least square curve fittig method. For each type of, we ca get three predictio curves, that is, the expected value curve, the etropy curve ad the hyper etropy curve. Mai fuctio is: Defie the solvig fuctios of the polyomial fittig coefficiet; x, y as the iput ; as the umbers of fittig: fuctio A=ihe(x,y,) Measure the legth of : m=legth(x); 4.2 Data pre-processig 25 4 7 24 30 28 23... 24 5 9 27 28 25 25... 30 6 23 28 27 25 25... 32 9 24 25 24 24 24... 30 22 25 22 2 2 24........................... Figure 6: Pixel matrix I order to hadle easily, it should do pre-processig for satellite cloud images before miig the kowledge. The mai purpose is to extract the image pixel value. First, satellite images eeds to be trasformed to the same size. The, the images should be coverted ito grid with extractig the correspodig pixel value. The size of the grid is set depedig o image cotet. Through the above steps, a pixel matrix ca be obtaied. The value of each matrix ca be as a cloud droplet i cloud trasformatio. Figure 6 shows a satellite image Geerate the X matrix: X=zeros(,2*); X2=[m,X(:)]; X3=zeros(,+); X3(j,:)=X(j:j+);ed X=[X2;X3]; Y=zeros(,); Y=[sum(y),Y];Y=Y'; Obtai fittig coefficiet vectors A: A=X/Y; 4.5 Predictio aalysis Through the above steps ad each type of satellite images, we ca get three differet projectio curves. We look them as the predictio rules to forecast the future satellite cloud. I this study, we use 80% of the sample sets as the traiig sets ad aother 20% of the as the predictio referece value. By comparig the predictio values with actual value, we ca get the model predictio accuracy rate. The results are show i table. It shows the results of the differet types of satellite 65
cloud through the predictio model, the average accuracy rate of three kids of satellite cloud is 88.3% ad meets projectio demads. The results show that the projectio of satellite cloud is feasible. Locatio Chia ad the Wester North Pacific Sea Ifrared Chia Regioal Vapour Chia Regioal Ifrared Traii g Test Projectio Accuracy 280 320 360 88.7% 440 360 360 86.3% 260 540 540 89.4% Table : the results of the satellite cloud projectio 5. CONCLUSIONS Most of the spatial have the time dimesio, ad will chage over time. Time series space cotais the time dimesio i spatial associatio characteristics ad ca get time series space associatio rules through time series space associatio rule miig. This paper preseted a method of time series rules miig, which played a importat sigificace for gettig time series space associatio rules ad doig the practical applicatio by time series space associatio rules. First, by the backward cloud geerator ad the three umerical characteristics of cloud model, this model described the features of time series. Secod, the digital features of a series of sample sets were obtaied as the traiig sample sets. The, by these feature poits, the rule curve fittig was predicted for obtaiig predictive models. Fially, the time series were predicted by the forecastig rules. The experimetal results showed that the method is feasible ad effective. Through the stadardizatio processig, this method ca be exteded to multiple applicatios. Time series miig is a iterdiscipliary sciece ad the further research icludig the followig aspects: () Other research of curve fittig method. [2] S. Shekhar, P. Zhag, Y. Huag et al. Treds i Spatial Data Miig. I: H.Kargupta, A.Joshi(Eds.), Data Miig: Next Geeratio Challeges ad Future Directios[C]. AAAI/MIT Press, 2003, 357-380. [3] G. Box, G. M. Jekis. Time series aalysis: Forecastig ad cotrol, Holde Day Ic., 976. [4] J. Ha, G. Dog, Y. Yi. Efficiet miig of partial periodic patters i time series base. I Proc. 999 It. Cof. Data Egieerig (ICDE'99), pages 06-5, Sydey, Australia, April 999. [5] B. Ozde, S. Ramaswamy, A. Silberschatz. Cyclic associatio rules. Proceedigs of the 5 th Iteratioal Coferece o Data Egieerig. 998, 42-42. [6] Y. Li, P. Nig, X. S. Wag et al. Discoverig caledarbased temporal associatio rules. Data&Kowledge Egieerig, 2003, 44: 93-28. [7] R. Agrawal, K. I. Li, H. S. Sawhey et al. Fast Similarity Search i the Presece of Noise, Scalig, ad Traslatio i Time-Series Databases. Proceedigs of the 2th Iteratioal Coferece o Very Large Data Bases, 995, 490-50. [8] T. Pavlidis, S. L. Horowitz. Segmetatio of plae curves. IEEE Tras. Comput. 23, 974, 860-870. [9] S. Park, S. W. Kim, W. W. Chu. Segmet-based approach for subsequece searches i sequece bases. I Proceedigs of the Sixteeth ACM Symposium o Applied Computig, 200, 248-252. [0] K. B. Pratt, E. Fik. Search for patters i compressed time series. Iteratioal Joural of Image ad Graphics, 2002, 2(): 89-06. [] H. Xiao, Y. F. Hu. Data Miig Based o Segmeted Time Warpig Distace i Time Series Database [J]. Computer Research ad Developmet, 2005, 42(): 72-78. [2] W. H. Cui, Z. Q. Gua, K. Qi. A Multi-Scale Image Segmetatio Algorithm Based o the Model. Proc. 8 th spatial accuracy assest. i atural resources, World Academic Uio, 2008. [3] X. Y. Tag, K. Y. Che, Y. F. Liu. Lad use classificatio ad evaluatio of RS image based o cloud model. Edited by Liu, Yaoli; Tag, Ximig. Proceedigs of the SPIE, 2009, Vol.7492, 74920N-74920N-8. [4] K. Qi, M. Xu, Y. Du et al. Model ad Hierarchical Clusterig Based Spatial Data Miig Method ad Applicatio. The Iteratioal Archives of the Photogrammetry, Remote Sesig ad Spatial Iformatio Scieces. Vol. XXXVII. Part B2, 2008. [5] D. R. Li, S. L. Wag, D. Y. Li. Spatial Data Miig Theories ad Applicatios. Sciece Press, 2006. (2) The combiatio of research of cloud model ad other algorithms. (3) The visualizatio studies of time series ad time sequece similarity studies. 6. ACKNOWLEDGEMENTS This paper is supported by Natioal 973 (2006CB70305, 2007CB30804), Natioal Natural Sciece Fud of Chia (6074300), Best Natioal Thesis Fud (2005047), ad Natioal New Cetury Excellet Talet Fud (NCET-06-068). 7. REFERENCES [] D. R. Li, S. L. Wag, W. Z. Shi et al. O Spatial Data Miig ad Kowledge Discovery [J]. Geomatics ad Iformatio Sciece of Wuha Uiversity, 200, 26(6): 49-499. 66