Modelling of Web Domain Visits by Radial Basis Function Neural Networks and Support Vector Machine Regression

Modellng of Web Doman Vsts by Radal Bass Functon Neural Networks and Support Vector Machne Regresson Vladmír Olej, Jana Flpová Insttute of System Engneerng and Informatcs Faculty of Economcs and Admnstraton, Unversty of Pardubce, Studentská 84 532 10 Pardubce, Czech Republc, {vladmr.olej@upce.cz, jana.flpova@upce.cz} Abstract. The paper presents basc notons of web mnng, radal bass functon (RBF) neural networks and -nsenstve support vector machne regresson ( - SVR) for the predcton of a tme seres for the webste of the Unversty of Pardubce. The model ncludes pre-processng tme seres, desgn RBF neural networks and -SVR structures, comparson of the results and tme seres predcton. The predctons concernng short, ntermedate and long tme seres for varous ratos of tranng and testng data. Predcton of web data can be beneft for a web server traffc as a complcated complex system. Keywords: Web mnng, radal bass functon neural networks, -nsenstve support vector machne regresson, tme seres, predcton. 1 Introducton Data modellng (predcton, classfcaton, optmzaton) obtaned by web mnng [1] from the log fles and data on a vrtual server, as a complcated complex system that affects these data, there are a few problems worked out. The web server represents a complcated complex system; vrtual system usually works wth several vrtual machnes that operate over multple databases. The qualty of the actvtes of the system also affects the data obtaned usng web mnng. Gven vrtual system s characterzed by ts operatonal parameters, whch are changng over tme, so t s a dynamc system. The data show a nonlnear characterstcs, are heterogeneous, nconsstent, mssng and uncertan. Currently there are a number of methods for modellng of the data obtaned by web mnng. These methods can generally be dvded nto methods of modellng wth unsupervsed learnng [2,3] and supervsed learnng [4]. The present work bulds on the modellng of web domans vst wth uncertanty [5,6]. In the paper s presented a problem formulaton wth the am of descrbng the tme seres web upce.cz (web presentaton vsts), ncludng possbltes of pre-processng whch s realzed by means of smple mathematc-statstc methods. Next, there are ntroduced basc notons of RBFs [7] neural networks and -SVRs [4,8] for tme

seres predcton. Further, the paper ncludes a comparson of the predcton results of the model desgned. The model represents a tme seres predcton for web upce.cz wth the pre-processng done by smple mathematcal statstcal methods. The predcton s carred out through RBF neural networks and -SVRs for dfferent duratons of tme seres and varous ratos O tran :O test tranng O tran and testng O test dates (O= O tran O test ). There s expected, that for shorter tme seres predcton t would be better to use RBF neural networks and for longer -SVRs. 2 Problem Formulaton The data for predcton of the tme seres web upce.cz over a gven tme perod was obtaned from Google Analytcs. Ths web mnng tool, whch makes use of Java- Scrpt code mplemented for web presentaton, offers a wde spectrum of operaton characterstcs (web metrcs). Metrcs provded by Google Analytcs can be dvded nto followng sectons: vsts, sources of access, contents and converson. In the secton vsts t can be montored for example the number of vstors, the number of vsts and number of pages vewed as well as the rato of new and returnng vstors. Indcator geolocaton,.e. from whch country are vstors most often, s needed to be known because of language mutatons, for example. In order to predct the vst rate to the Unversty of Pardubce, Czech Republc webste (web upce.cz) t s mportant montorng the ndcator of the number of vsts wthn a gven tme perod. One vst here s defned as an unrepeated combnaton of IP address and cookes. A submetrcs s absolutely unque vst defned by an unrepeatable IP address and cookes wthn gven tme perod. The basc nformaton obtaned from Google Analytcs about web upce.cz durng May 2009 consdered of the followng: The total vst rate durng gven monthly cycles. A clear trend s obvous there, wth Monday havng the hghest vst rate, whch n turn decreases as the week progresses; Saturday has the lowest vst rate; The average number of pages vsted s more than three; A vstor stays on certan page fve and half a mnutes on average; The bounce rate s approxmately 60%; Vstors generally come drectly to the webste, whch s postve; The favourte pages s the man page, followed by the pages of the Faculty of Economc and Admnstraton and the Faculty of Phlosophy. The measurement of the vst rate of the Unversty of Pardubce web page, (web upce.cz) took place tme perods of regular, controlled ntervals. The result represents a tme seres. The pre-processng of data was realzed by means of smple mathematc-statstc methods: a smple movng average (SMA), a central movng average (CMA), a movng medan (MM) along wth smple exponental smoothng (SES) and double exponental smoothng (DES) at tme t. Pre-processng was used wth am of smoothng the outlers, whle mantanng the physcal nterpretaton of data. The general formulaton of the model of predcton of the vst rate for upce.cz can be stated n thusly: y =f(x t 1,x t 2,,x t m), m=5, where y s the number of daly web vsts n tme t+1, y s the number of daly web vsts n tme t, x t 1 s SMA, x t 2 s CMA, x t 3 s MM, x t 4 s SES, and x t 5 s DES at tme t. An example of the preprocessng of the tme seres of web upce.cz s represented n Fg. 1.

Fg. 1. The pre-processng of web upce.cz vsts by SMA On the bass of the dfferent duratonal tme seres of web upce.cz, concrete predcton models for the vst rate to upce.cz web can be defned thusly y =f(ts S SMA,TS S CMA,TS S MM,TS S SES,TS S DES), y =f(ts I SMA,TS I CMA,TS I MM,TS I SES,TS I DES), y =f(ts L SMA,TS L CMA,TS L MM,TS L SES,TS L DES), where TS - tme seres, S - short TS (264 days), I - ntermedate TS (480 days), L - long TS (752 days) at tme t. (1) 3 Basc Notons of RBF Neural Networks and -SVR The term RBF neural network [7] refers to any knd of feed-forward neural networks that uses RBF as an actvaton functon. RBF neural networks are based on supervsed learnng. The output f(x,h,w) RBF of a neural network can be defned ths way q f ( x, H, w) w h ( x), (2) 1 where H={h 1 (x),h 2 (x),,h (x),,h q (x)} s a set of actvaton functons RBF of neurons (of RBF functons) n the hdden layer and w are synapse weghts. Each of the m components of vector x=(x 1,x 2,,x k,,x m ) s an nput value for the q actvaton functons h (x) of RBF neurons. The output f(x,h,w) of RBF neural network represents a lnear combnaton of outputs from q RBF neurons and correspondng synapse weghts w. The actvaton functon h (x) of an RBF neural network n the hdden layer belongs to a specal class of mathematcal functons whose man characterstc s a monotonous rsng or fallng at an ncreasng dstance from center c of the actvaton functon h (x) of an RBF. Neurons n the hdden layer can use one of several actvaton functons h (x) of an RBF neural network, for example a Gaussan actvaton functon (a one-dmensonal actvaton functon of RBF), a rotary Gaussan actvaton functon (a two-dmensonal RBF actvaton functon), multsquare and nverse multsquare actvaton functons or Cauchy s functons. Results may be presented n ths manner

q x c (3) h(x,c,r)= exp ( ), 1 r where x=(x 1,x 2,,x k,,x m ) represents the nput vector, C={c 1,c 2,,c,,c q } are centres of actvaton functons h (x) an RBF neural network and R={r 1,r 2,,r,,r q } are the raduses of actvaton functons h (x). The neurons n the output layer represent only a weghted sum of all nputs comng from the hdden layer. An actvaton functon of neurons n the output layer can be lnear, wth the unt of the output eventually beng convert by jump nstructon to bnary form. The RBF neural network learnng process requres a number of centres c of actvaton functon h (x) of the RBF neural networks to be set as well as for the most sutable postons for RBF centres c to be found. Other parameters are raduses of centres c, rate of actvaton functons h (x) of RBFs and synapse weghts W(q,n). These are setup between the hdden and output layers. The desgn of an approprate number of RBF neurons n the hdden layer s presented n [4]. Possbltes of centres recognton c are mentoned n [7] as a random choce. The poston of the neurons s chosen randomly from a set of tranng data. Ths approach presumes that randomly pcked centres c wll suffcently represent data enterng the RBF neural network. Ths method s sutable only for small sets of nput data. Use on larger sets, often results n a quck and needless ncrease n the number of RBF neurons n the hdden layer, and therefore an unjustfed complexty of the neural network. The second approach to locatng centres c of actvaton functons h (x) of RBF neurons can be realzed by a K-means algorthm. In nonlnear regresson -SVR [4,8,9,10,11] mnmzes the loss functon L(d,y) wth nsenstve [4,11]. Loss functon L(d,y)= d-y where d s the desred response and y s the output estmate. The constructon of the -SVR for approxmatng the desred response d can be used for the extenson of loss functon L(d,y) as follows d y L ( d, y) for d - y, (4) 0 else where s a parameter. Loss functon L (d,y) s called a loss functon wth nsenstve. Let the nonlnear regresson model n whch the dependence of the scalar d vector x expressed by d=f(x) + n. Addtve nose n s statstcally ndependent of the nput vector x. The functon f(.), and nose statstcs are unknown. Next, let the sample tranng data (x,d ), =1,2,...,N, where x and d s the correspondng value of the output model d. The problem s to obtan an estmate of d, dependng on x. For further progress s expected to estmate d, called y, whch s wdespread n the set of nonlnear bass functons j x), j=0,1,...,m 1 ths way y m j 1 0 w j j 2 ( x) wt ( x), (5) where (x)=[ (x), (x),, m 1 (x)] T and w=[w 0,w 1,...,w m1 ]. It s assumed that (x)=1 n order to the weght w 0 represents bas b. The soluton to the problem s to mnmze the emprcal rsk

1 N R emp L ( d, y ), (6) N 1 under condtons of nequalty w c 0, where c 0 s a constant. The restrcted optmzaton problem can be rephrased usng two complementary sets of non-negatve varables. Addtonal varables and descrbe loss functon L (d,y) wth nsenstvty. The restrcted optmzaton problem can be wrtten as an equvalent to mnmzng the cost functonal N 1 ( w,, ) C( ( wt )) w, (7) 1 2 under the constrants of two complementary sets of non-negatve varables and. The constant C s a user-specfed parameter. Optmzaton problem (7) can be easly solved n the dual form. The basc dea behnd the formulaton of the dual-shaped structure s the Lagrangan functon [9], the objectve functon and restrctons. Can then be defned Lagrange multplers wth ther functons and parameters whch ensure optmalty of these multplers. Optmzaton the Lagrangan functon only descrbes the orgnal regresson problem. To formulate the correspondng dual problem a convex functon can be obtaned (for shorthand) N Q (, ) d ( ) ( ) 1 1 N N ( )( j j ) K ( x, x j ), 2 1 j 1 where K(x,x j ) s kernel functon defned n accordance wth Mercers theorem [4,8]. Solvng optmzaton problem s obtaned by maxmzng Q( ) wth respect to Lagrange multplers and and provded a new set of constrants, whch hereby ncorporated constant C contaned n the functon defnton (w, ). Data ponts covered by the, defne support vectors [4,8,9]. N 1 (8) 4 Modellng and Analyss of the Results The desgned model n Fg. 2 demonstrates predcton modellng of the tme seres web upce.cz. Data pre-processng s carred out by means of data standardzaton. Thereby, the dependency on unts s elmnated. Then the data are pre-processed through smple mathematcal statstcal methods (SMA,CMA,MM,SES, and DES). Data pre-processng makes the sutable physcal nterpretaton of results possble. The pre-processng s run for tme seres of dfferent duraton namely (TS S SMA, TS S CMA,TS S MM,TS S SES,TS S DES),(TS I SMA,TS I CMA,TS I MM,TS I SES,TS I DES),(TS L SMA,TS L CMA, TS L MM,TS L SES,TS L DES). The predcton s made for the aforementoned pre-processed tme seres S, I, and L wth the help of RBF neural networks, -SVR wth a polynomal kernel functon and -SVR wth a RBF kernel functon for varous sets of tranng O tran and testng O test data.

y Data standardzaton Data pre-processng by SMA,CMA,MM,SES,DES Modellng by RBF neural networks Modellng by -SVR wth RBF kernel functon Modellng by -SVR wth polynomal kernel functon Comparson of the results and predcton y =f(x t 1,x t 2,,x t m), m=5 Fg. 2. Predcton modellng of tme seres for web.upce.cz In Fg. 3 are shown the dependences of Root Mean Squared Error (RMSE) on the number of neurons q n the hdden layer. The dependences RMSE on the parameter are represented n Fg. 4. In Fg. 5 the dependences RMSE are on the parameter for tranng O tran and as well as the testng O test set, where: - O tran_s, - O tran_i, - O tran_l, - O test_s, - O test_i, - O test_l. The parameter allows for an overrun of the local extreme n the learnng process and the followng progress of learnng. The parameter represent the selecton of centers RBFs as well as guarantees the correct allocaton of neurons n the hdden layer for the gven data enterng the RBF neural network. Fg. 3. RMSE dependences on the parameter q Fg. 4. RMSE dependences on the parameter Fg. 5. RMSE dependences on the parameter Conclusons presented n [4,7,10] are verfed by the analyss of results (wth 10- fold cross valdaton). RMSE tran s lowered untl t reaches to the value 0.3, when q neurons n the hdden layer ncreases n sze 125 and wth the greater rato O tran :O test.

RMSE test decreases only when the number q RBF neurons ncreases at a sgnfcantly slower rato than the rato O tran :O test. Mnmum RMSE test moves rght wth the ncreasng rato O tran :O test. Next, determnaton of the optmal number of q neurons n the hdden layer s necessary. Respectvely, a lower number q neurons represents the ncreasng RMSE test. In Table 1, Table 2, and Table 3 are shown optmzed results of the analyss of the experments for dfferent parameters of the RBF neural networks (wth RBF actvaton functon) wth dfferent duratons of tme seres, varous ratos of O tran :O test tranng O tran and testng O test of data sets and same amount of learnng at p=600 cycles. Table 2 bulds on the best set of q n Table1 and Table 3 bulds on the best set of q (Table 1) and (Table 2). Table 1. Optmzed results of RMSE analyss usng the amount of neurons q n the hdden layer ( and are constant for gven TS) TS O tran :O test Number q of neurons RMSE tran RMSE test S 50:50 80 0.9 1 0.383 0.408 S 66:34 80 0.9 1 0.331 0.463 S 80:20 80 0.9 1 0.343 0.365 I 50:50 125 0.9 1 0.372 0.316 I 66:34 125 0.9 1 0.295 0.440 I 80:20 125 0.9 1 0.352 0.326 L 50:50 25 0.9 1 0.409 0.402 L 66:34 25 0.9 1 0.404 0.415 L 80:20 25 0.9 1 0.409 0.390 Table 2. Optmzed results of RMSE analyss usng parameter q and are constant for gven TS) TS O tran :O test Number q of neurons RMSE tran RMSE test S 50:50 80 0.5 1 0.473 0.385 S 66:34 80 0.5 1 0.341 0.445 S 80:20 80 0.5 1 0.311 0.408 I 50:50 125 0.3 1 0.313 0.344 I 66:34 125 0.3 1 0.302 0.428 I 80:20 125 0.3 1 0.343 0.365 L 50:50 25 0.3 1 0.400 0.387 L 66:34 25 0.3 1 0.398 0.409 L 80:20 25 0.3 1 0.406 0.391 Table 3. Optmzed results of RMSE analyss usng parameter q and are constant for gven TS) TS O tran :O test Number q of neurons RMSE tran RMSE test S 50:50 80 0.5 1 0.385 0.473 S 66:34 80 0.5 1 0.341 0.445 S 80:20 80 0.5 1 0.311 0.408 I 50:50 125 0.3 3 0.357 0.375 I 66:34 125 0.3 3 0.327 0.405 I 80:20 125 0.3 3 0.376 0.441 L 50:50 25 0.3 1 0.400 0.387 L 66:34 25 0.3 1 0.398 0.409 L 80:20 25 0.3 1 0.406 0.391

The dependences RMSE on the parameter C are represented n Fg. 6, the dependences RMSE on the parameter n Fg. 7, the dependences RMSE on the parameter n Fg. 8 for tranng O tran as well as testng O test set, where: - O tran_s, - O tran_i, - O tran_l, - O test_s, - O test_i, - O test_l. The parameters C, are functons of kernel functons k(x,x ) [4,11] varatons. In the learnng process -SVR are set usng 10-fold cross valdaton. Parameter C controls the trade off between errors of the - SVR of tranng data and margn maxmzaton; [4] selects support vectors n the regresson structures, and represents the rate of polynomal kernel functon k(x,x ). The coeffcent characterzes polynomal and RBF kernel functon. Fg. 6. RMSE dependences on the parameter C Fg. 7. RMSE dependences on the parameter Fg. 8. RMSE dependences on the parameter The confrmaton of conclusons presented n [4,11] s verfed by the analyss of results. RMSE test for the -SVR wth RBF kernel functon lowers toward zero value wth decreasng C (n the case the user expermentaton) and hgher rato of O tran :O test. In the use of 10-fold cross valdaton, RMSE test moves towards zero wth an ncrease of parameter for O tran_s. In the -SVR wth polynomal kernel functon RMSE test sgnfcantly decreases when the parameter decreases (Fg. 7). Mnmum RMSE test moves rghts wth an ncrease of the parameter (the mnmum s between 0.2 to 0.3), whereas an ndrect correlaton between the rato O tran :O test and RMSE test obtans. In Table 4 and Table 5 are shown the optmzed results of the analyss of the experments for dfferent parameters of the -SVR (wth RBF and a polynomal kernel functon) wth dfferent duratons of tme seres, varous ratos O tran :O tes tranng O tran and testng O test of data sets and same amount of learnng at p=600 cycles. The tables are not represented by the partal results for the varous parameters, but only the resultng set of parameters for the TS and the rato O tran :O test.

Table 4. Optmzed results of RMSE analyss usng parameter C TS O tran :O test C RMSE tran RMSE test S 50:50 8 0.2 0.4 0.343 0.414 S 66:34 10 0.1 0.4 0.365 0.314 S 80:20 10 0.1 0.4 0.355 0.306 I 50:50 10 0.1 0.4 0.331 0.369 I 66:34 10 0.1 0.4 0.342 0.361 I 80:20 9 0.1 0.4 0.350 0.336 L 50:50 6 0.1 0.2 0.382 0.409 L 66:34 4 0.1 0.2 0.383 0.416 L 80:20 4 0.1 0.2 0.384 0.445 Table 5. Optmzed results of RMSE analyss usng parameters and TS O tran :O test C RMSE tran RMSE test S 50:50 10 0.1 0.2 1 0.411 0.319 S 66:34 10 0.2 0.2 2 0.474 0.379 S 80:20 10 0.1 0.2 3 0.587 0.509 I 50:50 6 0.1 0.3 1 0.385 0.416 I 66:34 10 0.1 0.3 2 0.436 0.486 I 80:20 6 0.1 0.3 3 0.526 0.567 L 50:50 10 0.1 0.2 2 0.455 0.468 L 66:34 6 0.1 0.2 1 0.385 0.416 L 80:20 8 0.1 0.2 1 0.385 0.443 The orgnal value of TS S (O test ) y compared wth predcted values of y (n whch tme s t+ t, t=1 day) usng RBF neural network s dsplayed n Fg. 9. In Fg. 10 TS S (O test ) y s then compared to predcted values y usng the -SVR (wth RBF kernel functon). Fg. 9. TS S wth values predcated by RBF Fg. 10. TS S wth values predcated by SVR In Table 6, there s a comparson of the RMSE tran and RMSE test on the tranng and testng set to other desgned and analyzed structures. For example, t was used a fuzzy nference system (FIS) Takag-Sugeno [5,6], ntutonstc fuzzy nference system (IFIS) Takag-Sugeno [5,6], feed-forward neural networks (FFNNs), RBF neural networks and -SVR 1 ( -SVR 2 ) wth RBF (polynomal kernel functon) wth preprocessng nput data by smple mathematcal statstcal methods. Table 6. Comparson of the RMSE tran and RMSE test on the tranng and testng data to other desgned and analyzed structures of fuzzy nference systems and neural networks FIS IFIS FFNN RBF -SVR 1 -SVR 2 RMSE tran 0.221 0.224 0.593 0.311 0.331 0.385 RMSE test 0.237 0.239 0.687 0.408 0.369 0.416

5 Concluson The proposed model conssts of data pre-processng and actual predcton usng RBF neural networks as well as -SVR wth polynomal and RBF kernel functons. Furthermore, the modellng was done for tme seres of dfferent lengths and dfferent parameters of neural networks. The analyss results for varous ratos of O tran :O test show trends for RMSE tran and RMSE test. From the analyss of all obtaned results of modellng tme seres by RBF neural network ( -SVR) shows that RMSE test takes mnmum values [4,10,11] for TS I (TS S ). Further drecton of research n the area of modellng web doman vsts (excludng the use of uncertanty [5,6] and modellng usng RBF neural networks and machne learnng usng -SVR) s focused on dfferent structures of neural networks. The crux of modellng are dfferent lengths of tme seres, varous ratos of O tran :O test and dfferent technques of ther parttonng. Predcton usng web mnng gves better characterzaton of webspace. On the bass of that system engneers can better characterze the load of complex vrtual system and ts dynamcs. The RBF neural networks ( -SVR) desgn was carred out n SPSS Clementne (STATISTICA). Acknowledgments. Ths work was supported by the scentfc research project of Mnstry of Envronment, the Czech Republc under Grant No: SP/42/60/07. References [1] Cooley, R., Mobasher, B., Srvstava, J.: Web Mnng: Informaton and Pattern Dscovery on the World Wde Web. In: 9 th IEEE Internatonal Conference on Tools wth Artfcal Intellgence, ICTAI 97, Newport Beach, CA (1997) [2] Krshnapuram, R., Josh, A., Y, L.: A Fuzzy Relatve of the K-medods Algorthm wth Applcaton to Document and Snppet Clusterng. In: IEEE Internatonal Conference on Fuzzy Systems, pp.1281-1286. Korea (1999) [3] Pedrycz, W.: Condtonal Fuzzy C-means. Pattern Recognton Letters, 17, 625-632 (1996) [4] Haykn, S.: Neural Networks: A Comprehensve Foundaton. Prentce-Hall Inc., New Jersey (1999) [5] Olej, V., Hájek, P., Flpová, J.: Modellng of Web Doman Vsts by IF-Inference System. WSEAS Transactons on Computers, Issue 10, 9, 1170-1180 (2010) [6] Olej, V., Flpová, J., Hájek, P.: Tme Seres Predcton of Web Doman Vsts by IF- Inference System. In: Proc. of the 14 th WSEAS Internatonal Conference on Systems, Latest Trends on Computers, N. Mastoraks at al. (eds.), Vol.1, Greece, pp.156-161 (2010) [7] Broomhead, D.S., Lowe, D.: Multvarate Functonal Interpolaton and Adaptve Networks. Complex Systems, 2, 321-355 (1988) [8] Crstann, N., Shawe-Taylor, J.: An Introducton to Support Vector Machnes and other Kernel-based Learnng Methods. Cambrdge Unversty Press, Cambrdge (2000) [9] Smola, A., Scholkopf, J.: A Tutoral on Support Vector Regresson. Statstcs and Computng, 14, 199 222 (2004) [10] Nyog, P., Gros, F.: On the Relatonshp between Generalzaton Error, Hypothess Complexty, and Sample Complexty for Radal Bass Functons. Massachusetts Insttute of Technology Artfcal Intellgence Laboratory, Massachusetts (1994) [11] Vapnk, V.N.: The Nature of Statstcal Learnng Theory. Sprnger-Verlag New York (1995)