A SVR-Based Data Farmg Techque for Web Appcato Ja L 1 ad Mjg Peg 2 1 Schoo of Ecoomcs ad Maagemet, Behag Uversty 100083 Bejg, P.R. Cha Ja@wyu.c 2 Isttute of Systems Scece ad Techoogy, Wuy Uversty, Jagme 529020, Guagdog, P.R. Cha reggepeg@163.com Abstract. I order to sove the probem that the performace of web appcato ca't meet users' expectato whe may users access dyamc data through the appcato, a data farmg optmzg techque for web appcato s proposed. I the proposed techque, the web appcato s based o XML ad the data are coverted to XML format before beg preseted through XSL. To reduce page respose tme, SVR s empoyed to forecast user's requests. Ad based o the requests, a data farmg aget s used to create web pages that may be eeded by users. At ast, expermets are coducted to compare page respose tme for three web stes: a dyamc web ste wthout data farmg optmzato, a data farmg optmzed web ste, ad a SVR based data farmg optmzed web ste. It s proved that page respose tme for the ast web ste s the east, whch proves that the proposed techque s effectve. Keywords: Data farmg, Web appcato, SVR, XML, Aget 1. INTRODUCTION I the e-commerce era, dyamc formato customzed for each user web appcatos provdes busess the opportuty of success. Ad performace of web appcatos s a crtca factor to the success of the busess. However, the dyamc formato s preseted from dyamc data database through dyamc web pages, whch requre much processg tme, ad hece the process woud ower the performace whe there are may users accessg the web appcato. At ths tme, opportutes are that the operatg systems are over-worked. There are two ways to sove the probem. Oe souto s upgradg the hardware supportg web appcatos. However, the upgrade requres more expedture ad more mateace efforts. The other souto s utzg the processg capabty the de tme through the techque of data farmg. Ad t s aso proposed ths paper. Data farmg meas that the processg abty of the de tme s used to geerate statc web pages that are possby be used ext operatoa sesso. I the proposed techque, Support Vector Regresso, or SVR, s empoyed to forecast users' eeds so that the approprate statc pages ca be prepared. The rest of ths paper s orgazed as foows. I secto 2, cocepts of web appcato are troduced. Ad as a mportat measure of performace, respose
434 Ja L ad Mjg Peg tme s descrbed ad aayzed. I secto 3, cocepts, stregths ad appcatos of data farmg are preseted. I secto 4, fudametas of SVR are show. Ad the, secto 5, the framework of the proposed techque s ustrated. Ad secto 6, expermets of three web stes: a dyamc web ste wthout data farmg optmzato, a data farmg optmzed web ste, ad a SVR based data farmg optmzed web ste are coducted to prove that the proposed techque s effectve. 2. WEB APPLICATION A web appcato s a mut-ter appcato system, whch commucates wth a appcato server through Hypertext Trasfer Protoco (HTTP) ad Trasmsso Cotro Protoco / Iteret Protoco (TCP/IP). It s capabe of dyamcay processg data ad the presetg the formato to users through web browsers. Stregths of web appcatos are: a) User-fredy terface. A formato of web appcato s preseted through web browsers, whch makes the appcato easy to use ad the users do t eed specazed trag; b) Easy mateace ad upgrades. Because that there s o eed to sta specazed software the cet sde, the oy job eeded to upgrade the system s upgradg the software system the server sde, ad t reduces mateace ad upgrade costs; c) Scaabty. The use of stadard TCP ad HTTP eabes the modue depedet ad thus makes system easy to expad; ad d) Hgh degree of formato sharg. Web appcatos use HTML/XML to trasmt formato, whch meas the data format s a ope stadard. Ad hece the dustry has gve web appcato broad supports. Due to above stregths, web appcatos have bee wdey used to provde customers ad parters wth customzed, coveet, cheap, ad effcet formato servces. Ad thus, compettveess of eterprses s promoted. However, arge accesses hder web appcatos from meetg user performace expectato, whch woud cosequety hder the success of busess. The performace s measured wth respose tme [1]. Respose tme of Web appcatos s the tme spet a HTTP request. It starts whe the cet seds a HTTP request ad eds whe the cet receves the frst byte data from the server. Respose tme uder mut-user requests refers to the respose tme for a user who radomy accesses the webstes wth some other users. The respose tme uder mut-user requests s deoted by, where superscrpt refers to the user detfcato, ad subscrpt refers the tota users. Assume that the respose tmes for users who cocurrety access the web appcato are 1,,,,, the average respose tme coud be computed usg equato (1)., 1 (1)
3. DATA FARMING A SVR-Based Data Farmg Techque for Web Appcato 435 Data farmg maxmzes system performace by esurg that formato devered to users by web appcato s avaabe whe the users requre them. The data farmg process s executed by a data farmg aget, whch are caed as a data farmer. It s a depedet computer process that s costaty ookg for opportutes to mprove the abty of web appcato to serve customers. It watches the web appcato server performace statstcs ad starts framg at ay pot whch the demad o the server drops beow a defed threshod. Ad the process cossts of four steps: a) Seect seeds, b) Grow, c) Harvest ad d) Store. The process s ustrated fgure 1. Store Harvest Grow Seect Seeds Data Fgure 1. The Process of Data Farmg 3.1 Seect Seeds Achevg argest profts by provdg rght products or servces to customers s the purpose of e-commerce web appcato. Ad what products or servces customers w choose coud be predcted through hstorca trasacto records [2, 3]. Ths research focuses o the data of eterprse customers ad proftabe products these customers may purchase, ad takes them as seeds ad grows them to prepared web pages presetg formato to customers. Seed s a combato of features from users ad products or servces that are to be devered by a web appcato. I ths step, seeds that are to be urtured are seected accordg to the forecastg resuts based o features of users ad products or servces usg SVR. The crtero for choosg s the probabty of users may terest the servces or products.
436 Ja L ad Mjg Peg 3.2 Grow I ths step, seected seeds eed to grow to statc web pages. These seeds ad some reevat data w be coected together to form the formato that users eed. The reevat data are cassfed to three casses: reevat data of customers, reevat data of products, ad system formato. For exampe, product recommedato page eeds user s formato of og status, formato about products ad user s profe. Ad the formato from these sources s coected ths step. 3.3 Harvest The coected data are structured usg XML format ad trasformed to web pages va XSL [4]. XML stads for extesbe Markup Laguage. It was desged to descrbe data ad to focus o what data s. Ad XSL s a famy of recommedatos for defg XML documet trasformato ad presetato. The process of harvest s ustrated fgure 2. Database Iformato of the user Iformato of products XML XSL Output Memory Cofgurato fes System Iformato Fgure 2. The Process of Harvest The process fuy expots the beefts of XML through ts vrtua XML tree approach. Each output pages from the system derves from a XML source. 3.4 Store These pages are stored somewhere fe system of web server, so that they ca be accessed whe users request. Ad those frequety accessed pages are cached to mprove the performace of the appcato.
A SVR-Based Data Farmg Techque for Web Appcato 437 4. SVR I ths techque, SVR s empoyed to forecast the probabty of a customer purchasg a product. Accordg to the probabty, seeds are seected to grow to statc web pages to reduce workoad of appcato busy tme. Statstca Learg Theory proposed by Vapk s a theory specazed o the earg probem through a mted umber of observatos [5]. Support Vector Mache (SVM) proposed based o the theory s a ew way for sovg oear probems. Through troducg- sestve oss fucto, SVM has bee exteded to sove probems of oear regresso. Ad the ew techque s caed as Support Vector Regresso. For a sampe dataset x, y,, x, y X, the purpose of ear regresso s 1 1 to fd the weght coeffcet w equato (2). y Xw (2) Where, s a coeffcet to be mmzed. Cosderg the purpose of regresso fucto ad -sestve oss fucto, ear regresso probem ca be trasformed to foowg optmzg probem: 1 m ( w,, ) ( ww) C (3) 2 1 Where, s the umber of observato, C s a fxed vaue that s used to spt the regresso error ad fucto feature. Above equato ca be trasformed to the foowg optmzg probem: max W(, ) ( y ) ( y ) 1 1 2, j1 j jx x j (4) 0 C, 1,2,, s.. t 0 C, 1,2,, ( ) 0 1 For oear regresso, the kere fucto s troduced to repace the er product x x j. Ad the optmzg probem equato (4) coud be trasformed to the foowg equato. (5)
438 Ja L ad Mjg Peg max W(, ) ( y ) ( y ) 1 1 2, j1 Kx, x j j j Costrats equato (6) are the same as equato (5). Vaues of parameters ad ca be obtaed by sovg equato (6). O the bass of KKT codtos, there are ot may ad that are uequa to 0. Sampes ( X, y ) correspodg to ad are referred as support vectors. The regresso fucto s the foowg form: Where, f x K x x b (7) ( ) ( ) (, ) SVs 1 b ( )[ K( x, x ) K( x, x )] (8) r s 2 SVs Ad xr, x s are support vectors. Equato (7) s the SVR mode. From above dscusso, we ca fd that t has two advatages. Frst, t s based o kere mappg whch esures ts oear processg abty. Secod, t s based o statstca earg theory, so t sythetcay takes fttg error ad fucto characterstcs of regresso mode to accout, so that ts geerazato abty s aso guarateed. (6) 5. SVR BASED DATA FARMING TECHNIQUE 5.1 Framework The whoe appcato ca be two phases: busy tme phase ad de tme phase. I the busy tme phase, the process of appcato s busy wth the requests from web users. Whe the process ca t fd a correspodg statc web page for a request, the job of respose woud be competed by dyamc pages. Ad the oy job data farmer does s watchg CPU, whe the percetage of de tme s arger tha a predefed a threshod, the appcato gets to de tme phase [6]. I the de tme phase, data farmer eeds to do a job besdes watchg CPU: farmg data of customers ad products, ad geeratg statc web pages. Whe the percetage of de tme s beow the predefed threshod, the process goes to the busy tme phase aga.
A SVR-Based Data Farmg Techque for Web Appcato 439 Motor CPU Data Farmer Database Forecast SVR Seeds Web Pages Dyamc Pages Web Appcato Fgure 3. Framework 5.2 Impemetato The database cudes hstorca database ad the customer database. A customer trasactos records are stored hstorca memory database, ad a customers profes are stored customer database. Database s based o Mcrosoft SQL Server 2000 patform. As SVR s MatLab code [7], ad appcato server s Mcrosoft.Net 2005, Combuder of MatLab s empoyed to compe SVR MatLab code to COM d [8], ad the SVR ca be caed by.net 2005. Data farmer s mpemeted as a depedet process, whch s resposbe for motorg CPU ad geeratg statc web pages. 6. EXPERIMENTS A webste wthout data farmg optmzato, a data farmg optmzed web ste, ad a SVR based data farmg optmzed web ste are but to verfy the performace of a web appcato. The web appcato s a Bog web ste ad provdes free basc servce. However, ts extra servces such as theme servces charge. The servces are provded by a PC server wth 2G memory. Ad the operatg system s wdows 2003 server stadard verso. Expermet parameters are show tabe 1.
440 Ja L ad Mjg Peg Tabe 1. Expermet Parameters Parameter Records for trag 317 Vaue SVR kere fucto 2 2 exp x xj / SVR sestve oss fucto 0.1 SVR pushmet factor C 60 2 6 LoadRuer 8.0 s used to coduct the test [9]. Ad the rage of the umber of cocurret users s from 50 to 500. Expermet resuts are potted fgure 4. Fgure 4. A Comparso of Expermeta Resuts The resuts dcate that data farmg s a effectve techque for mprovg performace of web appcato busy tme by trasformg dyamc web pages to statc web pages the de tme. Ad the proposed SVR based data farmg techque performs best that SVR heps seect sutabe seeds ad hece mprove the effcecy of data farmg. 7. CONCLUSIONS Data farmg s a effectve techque for mprovg performace of web appcato busy tme by trasformg dyamc web pages to statc web pages the de tme. Through the forecast of SVR, the proposed techque performs we mprovg effcecy of web appcato.
A SVR-Based Data Farmg Techque for Web Appcato 441 REFERENCES 1. Aoymous, Respose Tme, TechTarget (2007). http://searchetworkg.techtarget.com/sdefto/0,sd7_gc212896,00.htm (Accessed Juy 14, 2007). 2. D.C. Schmtte, D.G. Morrso, ad R. Coombo, Coutg your customers: who are they ad what w they do ext, Maagemet Scece. Voume 33, Number 1, pp.1-24, (1987). 3. D.C. Schmtte ad R.A. Peterso, Customer base aayss: a dustra purchase process appcato, Marketg Scece. Voume 13, Number 1, pp.41-67, (1994). 4. Aoymous, Extesbe Markup Laguage, W3C (2007). http://www.w3.org/xm (Accessed Juy 14, 2007). 5. V.N. Vapk, Statstca Learg Theory (Wey: New York, 1998). 6. D. Bure, A. A-Zobade, G. Wda, ad A. Buter, Sef-Optmsg Data Farmg for Web Appcatos, Proc. of the 15 th Iteratoa Workshop o Database ad Expert Systems Appcatos (DEXA 2004), eds. F. Gado, M. Takzawa, ad R. Traumüer (Sprger: Bosto, MA 2004), pp.436-440. 7. G.C. Cawey, MATLAB Support Vector Mache Toobox (v0.55\beta), Schoo of Iformato Systems, Uversty of East Aga, U.K. (2000). http://theova.sys.uea.ac.uk/svm/toobox/ (Accessed Juy 14, 2007). 8. Aoymous, Create MATLAB based.net ad COM compoets, Mathworks (2006). http://www.mathworks.com/products/etbuder/ (Accessed Juy 14, 2007). 9. Aoymous, Load Testg Software: Automated Performace Testg ad Web Testg Software, Mercury (2007). http://www.mercury.com/us/products/performaceceter/oadruer/