Improved SVM in Cloud Computing Information Mining

Similar documents
The Development of Web Log Mining Based on Improve-K-Means Clustering Analysis

Forecasting the Demand of Emergency Supplies: Based on the CBR Theory and BP Neural Network

Forecasting the Direction and Strength of Stock Market Movement

Study on Model of Risks Assessment of Standard Operation in Rural Power Network

On-Line Fault Detection in Wind Turbine Transmission System using Adaptive Filter and Robust Statistical Features

A Dynamic Load Balancing for Massive Multiplayer Online Game Server

Research on Evaluation of Customer Experience of B2C Ecommerce Logistics Enterprises

Institute of Informatics, Faculty of Business and Management, Brno University of Technology,Czech Republic

Single and multiple stage classifiers implementing logistic discrimination

Performance Analysis and Coding Strategy of ECOC SVMs

Performance Analysis of Energy Consumption of Smartphone Running Mobile Hotspot Application

Feature selection for intrusion detection. Slobodan Petrović NISlab, Gjøvik University College

A New Task Scheduling Algorithm Based on Improved Genetic Algorithm

A hybrid global optimization algorithm based on parallel chaos optimization and outlook algorithm

RESEARCH ON DUAL-SHAKER SINE VIBRATION CONTROL. Yaoqi FENG 1, Hanping QIU 1. China Academy of Space Technology (CAST)

Open Access A Load Balancing Strategy with Bandwidth Constraint in Cloud Computing. Jing Deng 1,*, Ping Guo 2, Qi Li 3, Haizhu Chen 1

A heuristic task deployment approach for load balancing

Optimization Model of Reliable Data Storage in Cloud Environment Using Genetic Algorithm

A Load-Balancing Algorithm for Cluster-based Multi-core Web Servers

Logistic Regression. Lecture 4: More classifiers and classes. Logistic regression. Adaboost. Optimization. Multiple class classification

Support Vector Machines

A DATA MINING APPLICATION IN A STUDENT DATABASE

DEFINING %COMPLETE IN MICROSOFT PROJECT

Bayesian Network Based Causal Relationship Identification and Funding Success Prediction in P2P Lending

A Programming Model for the Cloud Platform

8.5 UNITARY AND HERMITIAN MATRICES. The conjugate transpose of a complex matrix A, denoted by A*, is given by

An Interest-Oriented Network Evolution Mechanism for Online Communities

Lei Liu, Hua Yang Business School, Hunan University, Changsha, Hunan, P.R. China, Abstract

"Research Note" APPLICATION OF CHARGE SIMULATION METHOD TO ELECTRIC FIELD CALCULATION IN THE POWER CABLES *

An Alternative Way to Measure Private Equity Performance

IWFMS: An Internal Workflow Management System/Optimizer for Hadoop

PEER REVIEWER RECOMMENDATION IN ONLINE SOCIAL LEARNING CONTEXT: INTEGRATING INFORMATION OF LEARNERS AND SUBMISSIONS

Network Security Situation Evaluation Method for Distributed Denial of Service

THE APPLICATION OF DATA MINING TECHNIQUES AND MULTIPLE CLASSIFIERS TO MARKETING DECISION

Data Broadcast on a Multi-System Heterogeneous Overlayed Wireless Network *

Robust Design of Public Storage Warehouses. Yeming (Yale) Gong EMLYON Business School

The Application of Fractional Brownian Motion in Option Pricing

ECE544NA Final Project: Robust Machine Learning Hardware via Classifier Ensemble

PAS: A Packet Accounting System to Limit the Effects of DoS & DDoS. Debish Fesehaye & Klara Naherstedt University of Illinois-Urbana Champaign

Multiple-Period Attribution: Residuals and Compounding

Gender Classification for Real-Time Audience Analysis System

A New Service Pricing Mechanism based on Coalition Game Theory in

On the Optimal Control of a Cascade of Hydro-Electric Power Stations

An Evaluation of the Extended Logistic, Simple Logistic, and Gompertz Models for Forecasting Short Lifecycle Products and Services

Can Auto Liability Insurance Purchases Signal Risk Attitude?

A Secure Password-Authenticated Key Agreement Using Smart Cards

Calculating the high frequency transmission line parameters of power cables

Analysis of Premium Liabilities for Australian Lines of Business

A Novel Methodology of Working Capital Management for Large. Public Constructions by Using Fuzzy S-curve Regression

Document Clustering Analysis Based on Hybrid PSO+K-means Algorithm

Automated Network Performance Management and Monitoring via One-class Support Vector Machine

Dynamic Resource Allocation for MapReduce with Partitioning Skew

A Performance Analysis of View Maintenance Techniques for Data Warehouses

A Replication-Based and Fault Tolerant Allocation Algorithm for Cloud Computing

Risk Model of Long-Term Production Scheduling in Open Pit Gold Mining

IMPACT ANALYSIS OF A CELLULAR PHONE

Power-of-Two Policies for Single- Warehouse Multi-Retailer Inventory Systems with Order Frequency Discounts

The Greedy Method. Introduction. 0/1 Knapsack Problem

Resource Management and Organization in CROWN Grid

J. Parallel Distrib. Comput.

Joint Scheduling of Processing and Shuffle Phases in MapReduce Systems

AN APPOINTMENT ORDER OUTPATIENT SCHEDULING SYSTEM THAT IMPROVES OUTPATIENT EXPERIENCE

A Study on Secure Data Storage Strategy in Cloud Computing

A Dynamic Energy-Efficiency Mechanism for Data Center Networks

Resource Scheduling in Desktop Grid by Grid-JQA

NEURO-FUZZY INFERENCE SYSTEM FOR E-COMMERCE WEBSITE EVALUATION

Load Balancing Algorithm of Switched Dynamic Iteration

Set. algorithms based. 1. Introduction. System Diagram. based. Exploration. 2. Index

A DYNAMIC CRASHING METHOD FOR PROJECT MANAGEMENT USING SIMULATION-BASED OPTIMIZATION. Michael E. Kuhl Radhamés A. Tolentino-Peña

Politecnico di Torino. Porto Institutional Repository

The OC Curve of Attribute Acceptance Plans

Damage detection in composite laminates using coin-tap method

Application of an Improved BP Neural Network Model in Enterprise Network Security Forecasting

Rate Monotonic (RM) Disadvantages of cyclic. TDDB47 Real Time Systems. Lecture 2: RM & EDF. Priority-based scheduling. States of a process

How Sets of Coherent Probabilities May Serve as Models for Degrees of Incoherence

Genetic Algorithm Based Optimization Model for Reliable Data Storage in Cloud Environment

BERNSTEIN POLYNOMIALS

Risk-based Fatigue Estimate of Deep Water Risers -- Course Project for EM388F: Fracture Mechanics, Spring 2008

Statistical Approach for Offline Handwritten Signature Verification

A spam filtering model based on immune mechanism

Recurrence. 1 Definitions and main statements

1. Fundamentals of probability theory 2. Emergence of communication traffic 3. Stochastic & Markovian Processes (SP & MP)

Design and Development of a Security Evaluation Platform Based on International Standards

Using Content-Based Filtering for Recommendation 1

The Load Balancing of Database Allocation in the Cloud

Mining Multiple Large Data Sources

Searching for Interacting Features for Spam Filtering

Number of Levels Cumulative Annual operating Income per year construction costs costs ($) ($) ($) 1 600,000 35, , ,200,000 60, ,000

What is Candidate Sampling

Transcription:

Internatonal Journal of Grd Dstrbuton Computng Vol.8, No.1 (015), pp.33-40 http://dx.do.org/10.1457/jgdc.015.8.1.04 Improved n Cloud Computng Informaton Mnng Lvshuhong (ZhengDe polytechnc college JangSu NanJng 11106) 171375403@qq.com Abstract How to have a better mnng and use of nformaton n the cloud computng envronment consttutes the drecton of current research n the feld of cloud computng; ths paper ntroduces support vector machne () concept n the cloud computng data mnng, ntroduces a penalty factor n the, and mproves data mnng algorthms. The constructed Map / Reduce model by the concept of featured mult-tree conducted the valdaton of the model. Smulaton results show that data mnng methods of ths model have effectvely mproved the accuracy and the tme of nformaton mnng, hence have some practcal sgnfcance. Keywords: data mnng; penalty factor; Map / Reduce model 1. Introducton The growng exponentally cloud computng data no doubt poses a severe challenge to cloud computng data mnng [1]. In the cloud computng, the data mnng s based on a comprehensve analyss of dfferent cloud nodes; people fnd the value of the nformaton n the process of nformaton mnng, hence conduct processng nformaton n cloud computng envronment and provde a favorable data analyss for decson-makng departments. How to dg out the nformaton needed for the reader out of vast amounts of data n cloud computng envronment, s a realstc problem ndeed to be resolved. At present, the commonly used data mnng algorthms are neural networks, decson trees and Bayesan networks []; most methods conduct data mnng under dstrbuted envronment, hence these data mnng algorthms take up large storage space, ncrease network load,and extend the response tme [3-4]. Lterature [5] proposed a data mnng algorthm based on dynamc cloud computng model, and smulaton results show that the algorthm has certan advantages n terms of handlng massve amounts of data. Lterature [6] proposed an effcent data mnng framework on Hadoop, usng the database to smulate the chan structure, managng the excavated knowledge. Lterature [7] proposed a cloud computng network health assessment method based on support vector machne. Smulaton results show the feasblty of the method. Lterature [8] by usng the support vector machnes n coal ndustry under the cloud computng condtons, effectvely conducted mnng and analyss for coal data and acheved some results.. Related Knowledge.1 Support Vector Machnes Support vector machne (referred to as ) s a machne learnng method based on the VC theory, whose basc dea s to rely on a hyperplane as a decson, and make the maxmum dstance between the postve and negatve modes. Relyng on the optmal classfcaton surface, has been bult up; the optmal classfcaton surface has correctly separated the classfed surfaces, whle mantanng the maxmum spacng between classfcaton surfaces; the vector closest to the optmal classfed hyperplane becomes ISSN: 005-46 IJGDC Copyrght c 015 SERSC

Internatonal Journal of Grd Dstrbuton Computng support vector. The optmal partton hyperplane s calculate and descrbed as a condtonal extremum problem, obtanng the optmal soluton by Lagrange functon saddle pont: 1 Lp w C a{ y ( ( x ) w b) 1 } (1) By kernel functon theorem, the optmal soluton must satsfy the followng condtons w a y( x ) () 0 a C, a y 0 (3) In equaton (), under normal crcumstances, the default value of a s 1, the result of w s nonzero of x. The support vector machne takes the tranng data set as a unt to descrbe the features of unt of data; n aspects of makng the equvalent classfcaton and segmentaton data sets, vector machne occupes a very small part.. Data Mnng n the Cloud Computng Envronment Currently, data mnng n the cloud computng envronments reles on web logs as a study, by Web data mnng technques to understand and analyze the daly behavor of the user; log mnng n the cloud computng envronment s dvded nto: (1) classfcaton analyss, manly conducts classfcaton and analyss of user behavor characterstcs under cloud computng condtons; () assocaton rules analyss, to analyze the user pages behavor under cloud computng envronment; (3) frequent sequence access analyss: to analyze users frequently access under cloud computng condton, hence obtan access behavor patterns of users n the cloud computng, understand n-depth mportance of the relevant page, and mprove the operatng effectveness of Web ste; but because of the cloud tself has a broad dstrbuton, t tends to have large volumes of data, and s wdely dstrbuted wth heterogeneous structure, and the data s n real-tme transformaton, hence has greatly ncreased the complexty of the data structure, and brngs dffculty to data mnng n cloud computng. Especally n the data mnng process, the constant and multple access to databases presents new requrements to data mnng n cloud computng envronment..3 Map / Reduce Model Map / Reduce model n cloud computng envronment s a dstrbuted model; n dealng wth a large amount of data t has been wdely used [9]. Prncple of Map / Reduce model s by usng Map functon to dvde the nput process nto a large number of data peces, and the data segment unts wll be delvered to the computer for processng; by Reduce functon the processed data fragments wll be ntegrated together, and fnally output a summary; n Map / Reduce model, Master s responsble for task schedulng and sharng the data between the node; Worker s prmarly responsble for task processng and data processng n cloud computng [10-11]. As shown n Fgure 1 34 Copyrght c 015 SERSC

Internatonal Journal of Grd Dstrbuton Computng Userprogram message segmentaton desgntaonmap segmentaon Man-control program desgnaton Reduce segmentaton Fragment0 Fragment1 Workmachne Workmachne Local wrte Remote mreadng Workmachne Workmachne wrte wrte Output Fle Output Fle Fragment Fragment3 Fragment4 Workmachne Input Fle Map state Intermedtae Fle Reduce state Output Fle Fgure 1. Map / Reduce Model 3. n Cloud Computng Model Envronment 3.1 Support Vector Machnes wth the Introducton of Penalty Factor Ths paper has frst ntroduced n the penalty factor, assumng that x represents the nput vector, F s hgh-dmensonal nner product space; feature vector z F, z ( x), n whch s the characterstc functon, kernel functon s k( x, xl ) ( x ) ( xl ). Then optmzaton problems related to optmzaton features are: mn1 s. t w z b 1 I w z b 1 j J w j (4) In Equaton (4), z means lnearly separable; after the ntroducton of the penalty factor, related optmzaton problems have been transferred as: c mn1 w k s. t w z b 1 I (5) w z b 1 j J j 3. Introducton of Dstrbuted Support Vector Machnes n Cloud Computng Model When the data s large-scale n cloud computng system, the calculatng process of requres a very large amount of calculaton, therefore, the data n ths paper proposes usng cloud computng-based dstrbuted classfcaton algorthm for large-scale, complex cloud data, and the algorthm steps are as follows: (1) assgn data records of cloud computng envronment to dfferent computng nodes, and use the target equaton of the mproved support vector machnes functon: Copyrght c 015 SERSC 35

Internatonal Journal of Grd Dstrbuton Computng c f ( x ) mn1 w x (6) k () In each computng nodes, respectvely, to calculate the equaton value of for each node a y( x ), y( x) 以 及 x k x (3) After the calculaton of each node, merge summaton the correspondng ntermedate equatons values of each computng node, and accordng to the Lagrange functon factor vectors to obtan partal dfferental evaluaton of three mddle equatons, namely: n f 0 x a y( x ) x 1 n f 0 y y ( x) (7) y 1 f 0 x x x k (4) Accordng to (3), the expresson result s 0, resultng n the value of the optmal soluton x. 4. Mnng Algorthms Introduced Support Vector Machne based on Map / Reduce Model 4.1 Algorthm Steps Descrpton The algorthm s dstrbuted computng framework based on Map / Reduce model, usng three selecton algorthm to obtan overall frequent tem sets of data; n the cloud computng envronment the documentaton set s the documents collecton to be dealt wth under the assumpton n the cloud computng envronment. Take Dn { txt1, doc, excel 3 } as an example, steps are as follows: Step 1: Accordng to the contents of the document collecton, usng the Map functon to convert the data block f ( D ) c, D ; through Reduce functon t s expressed as n f ( D ) c, D, c, f ( D ) n n Step : n the output c, f ( D ) we ntroduce equaton (6) to calculate object equaton for each document n n D ; and ntroduce equaton (7) to calculate the optmal soluton set of the document: Dxn { txtx 1, docx, excel x3 }. Step 3: take f( D xn) n Step as the second entry of Map functon, and then get Map functon processng data block f ( Dxn ) cx, f ( Dxn ) Step 4: Repeat steps -3 untl teratons reach 4. 4. Algorthm Implementaton In ths paper, we use three selectons algorthm n the Map / Reduce model to get overall frequent tem sets, the algorthm mplementaton s as follows: 1: A varety of documents n cloud computng, D n ={txt 1,doc,excel 3 },N=3 :whle(n<3) 3:{Map(); 4: for(=1;<n;++) 36 Copyrght c 015 SERSC

Internatonal Journal of Grd Dstrbuton Computng 4.3 Algorthms Example 5: Fre()=Fre(-1)+1; 6: ouput(doc, Fre(<c x,doc x >)); 7: λ j =λ.frequency(); 8: Endfor 9: EndMap 10:Reduce() 11:Input(doc, Fre(<c x,doc x >)); 1:for(c x record(doc x )) 13:output(c x,fre(c x,doc x >)); 14:endReduce; c 15: f ( cx ) mn1 w x k 16: Fre(c x ) c x 17:N=N+1 18:EndWhle To further llustrate the usefulness of model n ths paper, the author ntroduced of the concept of characterstcs mult-tree from Lhe lterature [1] : nodes (excludng leaf nodes) of each layer of characterstcs mult-tree correspond to a collecton of the feature propertes; the sde corresponds to dfferent values of the data characterstcs, and data characterstcs s correspondng the sdes of mult-tree. The result of node storage s the value of mult-tree; each node n the mult-tree saves a certan value and the number of records. Steps are as follows: 1: Input: tranng samples are dstrbuted n subnodes of cloud computng, labeled as X = {X1, X... Xn}, reflectng support vector collecton ET of X1, X... Xn; node weghts n descendng order, E = {E1, E... En}, : Output: N of cloud feature mult-tree 3: Content: :For k=1 to N 4: If(k==1) 5: / / from N to get all Count (N) values of E1 and the number of records, and then create a node Enode accordng to statstcs n n 6: n 1 1 N 7:For k=1 to Count(N) 8: ENode kj (value,count)=createtree(e N, ) 9:EndFo 10:else 11:// Get the statstcal value of ET that meet the condtons of each node C(N, and the number of records; create node ENode kj based on the results of values respectvely. 1:For k=1 to Count(N) 13: For j=1 to C(N) 14: ENode kj [Count]=CreateTree(E k,, n ) 15: N+1 =CreateTree(ET ) Copyrght c 015 SERSC 37

Internatonal Journal of Grd Dstrbuton Computng 16: f ( n) 1 n1 N 17: EndFor 18:EndFor 5. Expermental Smulaton n Ths algorthm uses Cloudsm platform and selects network centry of the author s nsttute; the author has selected three supermarket near the school as cloud, and the supermarket nformaton wll be seen as a classfcaton problem; set N gven tranng samples f (, ) m x y,n whch x represents the nput sample, y means the sample output; usually -1 means loss means and +1 ndcates the presence. Due to retrevng books under cloud condtons related to the dynamc nature, and dffcult to be captured, support vector machne adopts the kernel functon k( x, y) exp( x y ) /, where m x, j 1 y. Extract the overall books nformaton data and number of readers m obtaned from several lbrares forecasts : the author extracted the man ngredent values to obtan data sets, and collected overall support vector data sets as tranng data, usng support vector machne algorthm for global mnng. Focused on the nformaton for each supermarket, the author extracted accordng to a certan proporton 1000,000,3000 records of product nformaton as measurement sample evenly dstrbuted n three dfferent databases, usng Lterature [13], [14] and the algorthm of ths paper to compare ther vector machne number, operaton duraton, classfcaton accuracy rate, and the results are shown n Table 1. Ths paper compares and dstrbuted, centralzed, as shown n Table. Table 1. Comparson between Dstrbuted Algorthm Name Supermarket 1 Product Informaton Supermarket Product Informaton Supermarket 3Product Informaton Algorthm Lterature[13 ] algorthm Lterature[14 ] algorthm algorthm Lterature[13 ] algorthm Lterature[14 ] algorthm algorthm Lterature[13 ] algorthm Lterature[14 ] algorthm algorthm Support Vector Run Tme (sec) Classfcaton accuracy 70 19.34 63.7% 75 15.3 68.37% 85 1.05 80.15% 55 36.1 49.18% 6 3.14 53.7% 67 8.96 80.19% 80 56.1 70.1% 84 39.7 73.19% 87 0.3 85.7% From Table 1 t s found that, when compared algorthm wth Lterature [13], 38 Copyrght c 015 SERSC

Internatonal Journal of Grd Dstrbuton Computng [14] algorthms, t has saved the runnng tme, but the classfcaton accuracy s moderate, showng that the algorthm n ths paper s effectve. Table. Comparson of the Proposed, Dstrbuted and Centralzed Record number of test nformaton 1000 000 3000 Support Vector Machne centralzed dstrbuted centralzed dstrbuted centralzed dstrbuted Suppor t Vector Tme (sec) Accuracy (%) 15 7.15 89.3 184 0.34 79.14 194 17.47 69.15 915 54.9 89.5 13 49.4 8.35 318 35.6 79.5 731 156.34 88.9 851 130.9 80.18 917 1056.16 75.6 From Table t can be found that when the data s n dstrbuted envronment, the proposed by ntroducng penalty factor only has to summarze data set and supports vector propertes characterstc values for aggregaton; the amount of data t needs s far less than the amount of dstrbuton record data. Meanwhle, heren has hgher accuracy than the centralzed and dstrbuted. From the overall performance pont of vew, by conductng book nformaton mnng n envronment proposed by ths paper has more advantages than that of the centralzed and dstrbuted n real envronment. 6. Concluson How to dg out the useful nformaton n the cloud computng envronment s the focus of current research, but because the data n cloud computng tself s dynamc wth uneven dstrbuton, the author has proposed to ntroduce a penalty factor n whle n the cloud computng model Map / Reduce to adopt three optons algorthm, whch to a certan extent has mproved the accuracy of the nformaton mnng under cloud computng envronment; the experment proved that mnng algorthms could acheve good results n cloud computng mnng resources. References [1] C. Mao, Web data mnng based on cloud computng, Computer Scence, vol. 8, no. 10, (011), pp. 146-149. [] Y. Y, Data mnng based on cloud computng, Mcroelectroncs and Computer, vol. 30, no., (013), pp. 161-164. [3] D. Jng, Data mnng servce model n cloud computng envronment, Computer Scence, vol. 39, no. Copyrght c 015 SERSC 39

Internatonal Journal of Grd Dstrbuton Computng 6, (01), pp. 17-19. [4] Z. Youln, Analyss of cloud servces n data mnng, Intellgence theory and practce, vol. 35, no. 9, (01), pp. 33-36. [5] G. Xn, A data mnng algorthms n dynamc cloud model, Mn-Mcro Systems, vol. 34, no. 1, (013), pp. 749-754. [6] Y. La, Parallel data mnng method based on Hadoop cloud platform, vol. 5, no. 5, (013), pp. 934-944. [7] W. Xangx, Assessment of web health status based on support vector machne and the cloud model, Bejng Unversty of Posts and Telecommuncatons, vol. 35, no. 1, (01), pp. 10-14. [8] L. Xuezh, Classfcaton predct applcaton of dstrbuted support vector machne under cloud computng platform n the coal ndustry, Coal technology, vol. 3, no. 11, (013), pp. 48-50. [9] L Zhen, Improved Map-Reduce model n cloud computng envronment, Computer Engneerng and Applcatons, vol. 38, no. 1, (01), pp. 7-30. [10] L. Jng, Map-Reduce job schedulng algorthm based on mproved leapfrog strategy, Computer Applcatons Research, vol. 30, no. 7, (013), pp. 1999-00. [11] D. Lnln, Massve data effcent Skylne query processng based on Map-Reduce, Journal of Computers, vol. 34, no. 10, (011), pp. 1785-1795. [1] C. Zhenzhou, Mult-tree expresson of curve, Surveyng and Mappng, vol. 4, no. 4, (013), pp. 60-606 [13] Q. Yhu, Telecommuncatons ndustry customer churn predcton based on random forests and sngle class support vector machnes, Xamen Unversty (Natural Scence Edton), vol. 5, no. 5, (013), pp. 603-605. [14] Q. Tao, Improved support vector machne applcatons n the telecommuncatons customer loss forecasts, Computer Smulaton, vol. 8, no. 7, (011), pp. 39-33. Authors Lu Shuhong (1980.-), female, senor lecturer, research team leader, graduate; research drecton: software technology, network securty. 论 文 题 目 所 属 主 题 一 种 改 进 的 在 云 计 算 信 息 挖 掘 中 的 研 究 其 他 主 题 第 一 作 者 姓 名 吕 树 红 职 称 / 学 位 高 级 讲 师 单 位 正 德 职 业 技 术 学 院 邮 编 11106 地 址 南 京 市 江 宁 经 济 开 发 区 将 军 大 道 18 号 电 话 手 机 1805755807 Emal 171375403@qq.com 第 二 作 者 姓 名 单 位 职 称 / 学 位 邮 编 地 址 电 话 手 机 Emal 40 Copyrght c 015 SERSC