Improved SVM in Cloud Computing Information Mining

Internatonal Journal of Grd Dstrbuton Computng Vol.8, No.1 (015), pp.33-40 http://dx.do.org/10.1457/jgdc.015.8.1.04 Improved n Cloud Computng Informaton Mnng Lvshuhong (ZhengDe polytechnc college JangSu NanJng 11106) 171375403@qq.com Abstract How to have a better mnng and use of nformaton n the cloud computng envronment consttutes the drecton of current research n the feld of cloud computng; ths paper ntroduces support vector machne () concept n the cloud computng data mnng, ntroduces a penalty factor n the, and mproves data mnng algorthms. The constructed Map / Reduce model by the concept of featured mult-tree conducted the valdaton of the model. Smulaton results show that data mnng methods of ths model have effectvely mproved the accuracy and the tme of nformaton mnng, hence have some practcal sgnfcance. Keywords: data mnng; penalty factor; Map / Reduce model 1. Introducton The growng exponentally cloud computng data no doubt poses a severe challenge to cloud computng data mnng [1]. In the cloud computng, the data mnng s based on a comprehensve analyss of dfferent cloud nodes; people fnd the value of the nformaton n the process of nformaton mnng, hence conduct processng nformaton n cloud computng envronment and provde a favorable data analyss for decson-makng departments. How to dg out the nformaton needed for the reader out of vast amounts of data n cloud computng envronment, s a realstc problem ndeed to be resolved. At present, the commonly used data mnng algorthms are neural networks, decson trees and Bayesan networks []; most methods conduct data mnng under dstrbuted envronment, hence these data mnng algorthms take up large storage space, ncrease network load,and extend the response tme [3-4]. Lterature [5] proposed a data mnng algorthm based on dynamc cloud computng model, and smulaton results show that the algorthm has certan advantages n terms of handlng massve amounts of data. Lterature [6] proposed an effcent data mnng framework on Hadoop, usng the database to smulate the chan structure, managng the excavated knowledge. Lterature [7] proposed a cloud computng network health assessment method based on support vector machne. Smulaton results show the feasblty of the method. Lterature [8] by usng the support vector machnes n coal ndustry under the cloud computng condtons, effectvely conducted mnng and analyss for coal data and acheved some results.. Related Knowledge.1 Support Vector Machnes Support vector machne (referred to as ) s a machne learnng method based on the VC theory, whose basc dea s to rely on a hyperplane as a decson, and make the maxmum dstance between the postve and negatve modes. Relyng on the optmal classfcaton surface, has been bult up; the optmal classfcaton surface has correctly separated the classfed surfaces, whle mantanng the maxmum spacng between classfcaton surfaces; the vector closest to the optmal classfed hyperplane becomes ISSN: 005-46 IJGDC Copyrght c 015 SERSC

Internatonal Journal of Grd Dstrbuton Computng support vector. The optmal partton hyperplane s calculate and descrbed as a condtonal extremum problem, obtanng the optmal soluton by Lagrange functon saddle pont: 1 Lp w C a{ y ( ( x ) w b) 1 } (1) By kernel functon theorem, the optmal soluton must satsfy the followng condtons w a y( x ) () 0 a C, a y 0 (3) In equaton (), under normal crcumstances, the default value of a s 1, the result of w s nonzero of x. The support vector machne takes the tranng data set as a unt to descrbe the features of unt of data; n aspects of makng the equvalent classfcaton and segmentaton data sets, vector machne occupes a very small part.. Data Mnng n the Cloud Computng Envronment Currently, data mnng n the cloud computng envronments reles on web logs as a study, by Web data mnng technques to understand and analyze the daly behavor of the user; log mnng n the cloud computng envronment s dvded nto: (1) classfcaton analyss, manly conducts classfcaton and analyss of user behavor characterstcs under cloud computng condtons; () assocaton rules analyss, to analyze the user pages behavor under cloud computng envronment; (3) frequent sequence access analyss: to analyze users frequently access under cloud computng condton, hence obtan access behavor patterns of users n the cloud computng, understand n-depth mportance of the relevant page, and mprove the operatng effectveness of Web ste; but because of the cloud tself has a broad dstrbuton, t tends to have large volumes of data, and s wdely dstrbuted wth heterogeneous structure, and the data s n real-tme transformaton, hence has greatly ncreased the complexty of the data structure, and brngs dffculty to data mnng n cloud computng. Especally n the data mnng process, the constant and multple access to databases presents new requrements to data mnng n cloud computng envronment..3 Map / Reduce Model Map / Reduce model n cloud computng envronment s a dstrbuted model; n dealng wth a large amount of data t has been wdely used [9]. Prncple of Map / Reduce model s by usng Map functon to dvde the nput process nto a large number of data peces, and the data segment unts wll be delvered to the computer for processng; by Reduce functon the processed data fragments wll be ntegrated together, and fnally output a summary; n Map / Reduce model, Master s responsble for task schedulng and sharng the data between the node; Worker s prmarly responsble for task processng and data processng n cloud computng [10-11]. As shown n Fgure 1 34 Copyrght c 015 SERSC

Internatonal Journal of Grd Dstrbuton Computng Userprogram message segmentaton desgntaonmap segmentaon Man-control program desgnaton Reduce segmentaton Fragment0 Fragment1 Workmachne Workmachne Local wrte Remote mreadng Workmachne Workmachne wrte wrte Output Fle Output Fle Fragment Fragment3 Fragment4 Workmachne Input Fle Map state Intermedtae Fle Reduce state Output Fle Fgure 1. Map / Reduce Model 3. n Cloud Computng Model Envronment 3.1 Support Vector Machnes wth the Introducton of Penalty Factor Ths paper has frst ntroduced n the penalty factor, assumng that x represents the nput vector, F s hgh-dmensonal nner product space; feature vector z F, z ( x), n whch s the characterstc functon, kernel functon s k( x, xl ) ( x ) ( xl ). Then optmzaton problems related to optmzaton features are: mn1 s. t w z b 1 I w z b 1 j J w j (4) In Equaton (4), z means lnearly separable; after the ntroducton of the penalty factor, related optmzaton problems have been transferred as: c mn1 w k s. t w z b 1 I (5) w z b 1 j J j 3. Introducton of Dstrbuted Support Vector Machnes n Cloud Computng Model When the data s large-scale n cloud computng system, the calculatng process of requres a very large amount of calculaton, therefore, the data n ths paper proposes usng cloud computng-based dstrbuted classfcaton algorthm for large-scale, complex cloud data, and the algorthm steps are as follows: (1) assgn data records of cloud computng envronment to dfferent computng nodes, and use the target equaton of the mproved support vector machnes functon: Copyrght c 015 SERSC 35

Internatonal Journal of Grd Dstrbuton Computng c f ( x ) mn1 w x (6) k () In each computng nodes, respectvely, to calculate the equaton value of for each node a y( x ), y( x) 以及 x k x (3) After the calculaton of each node, merge summaton the correspondng ntermedate equatons values of each computng node, and accordng to the Lagrange functon factor vectors to obtan partal dfferental evaluaton of three mddle equatons, namely: n f 0 x a y( x ) x 1 n f 0 y y ( x) (7) y 1 f 0 x x x k (4) Accordng to (3), the expresson result s 0, resultng n the value of the optmal soluton x. 4. Mnng Algorthms Introduced Support Vector Machne based on Map / Reduce Model 4.1 Algorthm Steps Descrpton The algorthm s dstrbuted computng framework based on Map / Reduce model, usng three selecton algorthm to obtan overall frequent tem sets of data; n the cloud computng envronment the documentaton set s the documents collecton to be dealt wth under the assumpton n the cloud computng envronment. Take Dn { txt1, doc, excel 3 } as an example, steps are as follows: Step 1: Accordng to the contents of the document collecton, usng the Map functon to convert the data block f ( D ) c, D ; through Reduce functon t s expressed as n f ( D ) c, D, c, f ( D ) n n Step : n the output c, f ( D ) we ntroduce equaton (6) to calculate object equaton for each document n n D ; and ntroduce equaton (7) to calculate the optmal soluton set of the document: Dxn { txtx 1, docx, excel x3 }. Step 3: take f( D xn) n Step as the second entry of Map functon, and then get Map functon processng data block f ( Dxn ) cx, f ( Dxn ) Step 4: Repeat steps -3 untl teratons reach 4. 4. Algorthm Implementaton In ths paper, we use three selectons algorthm n the Map / Reduce model to get overall frequent tem sets, the algorthm mplementaton s as follows: 1: A varety of documents n cloud computng, D n ={txt 1,doc,excel 3 },N=3 :whle(n<3) 3:{Map(); 4: for(=1;<n;++) 36 Copyrght c 015 SERSC

Internatonal Journal of Grd Dstrbuton Computng 4.3 Algorthms Example 5: Fre()=Fre(-1)+1; 6: ouput(doc, Fre(<c x,doc x >)); 7: λ j =λ.frequency(); 8: Endfor 9: EndMap 10:Reduce() 11:Input(doc, Fre(<c x,doc x >)); 1:for(c x record(doc x )) 13:output(c x,fre(c x,doc x >)); 14:endReduce; c 15: f ( cx ) mn1 w x k 16: Fre(c x ) c x 17:N=N+1 18:EndWhle To further llustrate the usefulness of model n ths paper, the author ntroduced of the concept of characterstcs mult-tree from Lhe lterature [1] : nodes (excludng leaf nodes) of each layer of characterstcs mult-tree correspond to a collecton of the feature propertes; the sde corresponds to dfferent values of the data characterstcs, and data characterstcs s correspondng the sdes of mult-tree. The result of node storage s the value of mult-tree; each node n the mult-tree saves a certan value and the number of records. Steps are as follows: 1: Input: tranng samples are dstrbuted n subnodes of cloud computng, labeled as X = {X1, X... Xn}, reflectng support vector collecton ET of X1, X... Xn; node weghts n descendng order, E = {E1, E... En}, : Output: N of cloud feature mult-tree 3: Content: :For k=1 to N 4: If(k==1) 5: / / from N to get all Count (N) values of E1 and the number of records, and then create a node Enode accordng to statstcs n n 6: n 1 1 N 7:For k=1 to Count(N) 8: ENode kj (value,count)=createtree(e N, ) 9:EndFo 10:else 11:// Get the statstcal value of ET that meet the condtons of each node C(N, and the number of records; create node ENode kj based on the results of values respectvely. 1:For k=1 to Count(N) 13: For j=1 to C(N) 14: ENode kj [Count]=CreateTree(E k,, n ) 15: N+1 =CreateTree(ET ) Copyrght c 015 SERSC 37

Internatonal Journal of Grd Dstrbuton Computng 16: f ( n) 1 n1 N 17: EndFor 18:EndFor 5. Expermental Smulaton n Ths algorthm uses Cloudsm platform and selects network centry of the author s nsttute; the author has selected three supermarket near the school as cloud, and the supermarket nformaton wll be seen as a classfcaton problem; set N gven tranng samples f (, ) m x y,n whch x represents the nput sample, y means the sample output; usually -1 means loss means and +1 ndcates the presence. Due to retrevng books under cloud condtons related to the dynamc nature, and dffcult to be captured, support vector machne adopts the kernel functon k( x, y) exp( x y ) /, where m x, j 1 y. Extract the overall books nformaton data and number of readers m obtaned from several lbrares forecasts : the author extracted the man ngredent values to obtan data sets, and collected overall support vector data sets as tranng data, usng support vector machne algorthm for global mnng. Focused on the nformaton for each supermarket, the author extracted accordng to a certan proporton 1000,000,3000 records of product nformaton as measurement sample evenly dstrbuted n three dfferent databases, usng Lterature [13], [14] and the algorthm of ths paper to compare ther vector machne number, operaton duraton, classfcaton accuracy rate, and the results are shown n Table 1. Ths paper compares and dstrbuted, centralzed, as shown n Table. Table 1. Comparson between Dstrbuted Algorthm Name Supermarket 1 Product Informaton Supermarket Product Informaton Supermarket 3Product Informaton Algorthm Lterature[13 ] algorthm Lterature[14 ] algorthm algorthm Lterature[13 ] algorthm Lterature[14 ] algorthm algorthm Lterature[13 ] algorthm Lterature[14 ] algorthm algorthm Support Vector Run Tme (sec) Classfcaton accuracy 70 19.34 63.7% 75 15.3 68.37% 85 1.05 80.15% 55 36.1 49.18% 6 3.14 53.7% 67 8.96 80.19% 80 56.1 70.1% 84 39.7 73.19% 87 0.3 85.7% From Table 1 t s found that, when compared algorthm wth Lterature [13], 38 Copyrght c 015 SERSC

Internatonal Journal of Grd Dstrbuton Computng [14] algorthms, t has saved the runnng tme, but the classfcaton accuracy s moderate, showng that the algorthm n ths paper s effectve. Table. Comparson of the Proposed, Dstrbuted and Centralzed Record number of test nformaton 1000 000 3000 Support Vector Machne centralzed dstrbuted centralzed dstrbuted centralzed dstrbuted Suppor t Vector Tme (sec) Accuracy (%) 15 7.15 89.3 184 0.34 79.14 194 17.47 69.15 915 54.9 89.5 13 49.4 8.35 318 35.6 79.5 731 156.34 88.9 851 130.9 80.18 917 1056.16 75.6 From Table t can be found that when the data s n dstrbuted envronment, the proposed by ntroducng penalty factor only has to summarze data set and supports vector propertes characterstc values for aggregaton; the amount of data t needs s far less than the amount of dstrbuton record data. Meanwhle, heren has hgher accuracy than the centralzed and dstrbuted. From the overall performance pont of vew, by conductng book nformaton mnng n envronment proposed by ths paper has more advantages than that of the centralzed and dstrbuted n real envronment. 6. Concluson How to dg out the useful nformaton n the cloud computng envronment s the focus of current research, but because the data n cloud computng tself s dynamc wth uneven dstrbuton, the author has proposed to ntroduce a penalty factor n whle n the cloud computng model Map / Reduce to adopt three optons algorthm, whch to a certan extent has mproved the accuracy of the nformaton mnng under cloud computng envronment; the experment proved that mnng algorthms could acheve good results n cloud computng mnng resources. References [1] C. Mao, Web data mnng based on cloud computng, Computer Scence, vol. 8, no. 10, (011), pp. 146-149. [] Y. Y, Data mnng based on cloud computng, Mcroelectroncs and Computer, vol. 30, no., (013), pp. 161-164. [3] D. Jng, Data mnng servce model n cloud computng envronment, Computer Scence, vol. 39, no. Copyrght c 015 SERSC 39

Internatonal Journal of Grd Dstrbuton Computng 6, (01), pp. 17-19. [4] Z. Youln, Analyss of cloud servces n data mnng, Intellgence theory and practce, vol. 35, no. 9, (01), pp. 33-36. [5] G. Xn, A data mnng algorthms n dynamc cloud model, Mn-Mcro Systems, vol. 34, no. 1, (013), pp. 749-754. [6] Y. La, Parallel data mnng method based on Hadoop cloud platform, vol. 5, no. 5, (013), pp. 934-944. [7] W. Xangx, Assessment of web health status based on support vector machne and the cloud model, Bejng Unversty of Posts and Telecommuncatons, vol. 35, no. 1, (01), pp. 10-14. [8] L. Xuezh, Classfcaton predct applcaton of dstrbuted support vector machne under cloud computng platform n the coal ndustry, Coal technology, vol. 3, no. 11, (013), pp. 48-50. [9] L Zhen, Improved Map-Reduce model n cloud computng envronment, Computer Engneerng and Applcatons, vol. 38, no. 1, (01), pp. 7-30. [10] L. Jng, Map-Reduce job schedulng algorthm based on mproved leapfrog strategy, Computer Applcatons Research, vol. 30, no. 7, (013), pp. 1999-00. [11] D. Lnln, Massve data effcent Skylne query processng based on Map-Reduce, Journal of Computers, vol. 34, no. 10, (011), pp. 1785-1795. [1] C. Zhenzhou, Mult-tree expresson of curve, Surveyng and Mappng, vol. 4, no. 4, (013), pp. 60-606 [13] Q. Yhu, Telecommuncatons ndustry customer churn predcton based on random forests and sngle class support vector machnes, Xamen Unversty (Natural Scence Edton), vol. 5, no. 5, (013), pp. 603-605. [14] Q. Tao, Improved support vector machne applcatons n the telecommuncatons customer loss forecasts, Computer Smulaton, vol. 8, no. 7, (011), pp. 39-33. Authors Lu Shuhong (1980.-), female, senor lecturer, research team leader, graduate; research drecton: software technology, network securty. 论文题目所属主题一种改进的在云计算信息挖掘中的研究其他主题第一作者姓名吕树红职称 / 学位高级讲师单位正德职业技术学院邮编 11106 地址南京市江宁经济开发区将军大道 18 号电话手机 1805755807 Emal 171375403@qq.com 第二作者姓名单位职称 / 学位邮编地址电话手机 Emal 40 Copyrght c 015 SERSC