An Effiient Network Traffi Classifiation Based on Unknown and Anomaly Flow Detetion Mehanism G.Suganya.M.s.,B.Ed 1 1 Mphil.Sholar, Department of Computer Siene, KG College of Arts and Siene,Coimbatore. Tamil nadu, India. Abstrat Traffi tehnique is an important tool for network and system seurity in the environments suh as loud omputing based environment. Modern methods plans to take the gain of flow statistial features and mahine learning methods, but the performane is affeted by redued supervised information, and unfamiliar appliations. In addition detetion of anomalies in the flow level is not onsidered in earlier approahes. Current work proposes Flow-level anomaly detetion with the framework of Unknown Flow Detetion approahes. Flow-level anomaly an be deteted by using Syntheti flow-level trae generation approah(sg FLT). The two major hallenges with suh an approah are to haraterize normal and anomalous network behavior, and to disover realisti models defining normal and anomalous at the flow level. Unknown flow detetion approah has been performed by Flow level propagation and finding the orrelated flows to boost the auray. Performane evaluation is onduted on real-world network datasets whih demonstrates that the proposed sheme provides effiient performane than existing methods in the omplex network environment. Keywords Traffi, unknown flow detetion, anomaly flow detetion, ompound I. INTRODUCTION Traffi system plays asignifiantpart in and management arhitetures [1] and modern network seurity. For example, is normally an important omponent in the produts forintrusion detetion [2] and QoS ontrol [3]. With the advent of loud omputing the aggregate of appliations organized on the Internet is rapidly inreasing and several appliations implement the enryption tehniques. This ondition makes it firmer to lassify flows onsistent with their generation appliations. Conventional systemsdepend on on heking the exat port numbers used by dissimilar appliations, or examining the appliations signature strings in the payload of IP pakets. These methods enounter several problems in the urrent network suh as user privay protetion, dynami port numbers and data enryption. Presently, the methods inline to ondut by examining flow level statistial properties [4]. Considerabledevotion has been paid on the appliation of mahine learning systems to flow statistial features based. Still, the performane of the existing flow statistial feature based isunonvined in real world settings. Severalunsupervised lustering algorithms and supervised algorithms have been applied to network. From the labelled training samples of eah predefined lass,the flow model is learned in supervised [4]. This method lassifieseveryflow into predefined lasses;onsequently they annot agree with indefinite flows produed by unknown appliations. Furthermore, to attain high auray, the supervised methods wantenough labelled training data. In ontrast, the lusteringbased methods [5] an repeatedly group a set of unlabelled training samples and apply the lustering results to build a lassifier. In these methods, yet, the number of lusters has to be set huge enough to attain high-purity lusters. It is a hard problem of plotting from a huge number of lusters to a small number of real appliations without supervised information. The existing methods suffer from poor performane in the ritial situation where supervised information is inadequate and substantial unknown flows with anomalies are present. Reently benign and maliious anomaliessuh asnetwork outages, worms,denial of-servie attaks and flash rowds have the prospetive to disturb ritial serviesand infrastrutures. Inspired by the remark thatdetetion at the network edge is not ompatible for omprisingsuh largesale attaks, numerous anomaly detetion systemsfor bakbone networks have been developed. These systemswork on data olleted at the flow level, meanwhile inspeting single pakets is not possibleon high-speed bakbone links. Sine annotation, anonymization, and modifiation of real traes fail in produingstandard evaluation traes,there is a strong need for another approah. We trustthat syntheti generation of standard traes has the possibleto report the three reognized problems. We visualize a synthetitrae generation sheme that yields normal flow rendering a baseline model, and anomalous flow onstruted on a diversity of anomaly models. Still, the hallengewith generating syntheti flow-level traes is twofold: Initially,we need to desribe what is onsidered normal and anomalousnetwork behavior, and then we need to find the suitablenormal and anomalous models. We propose an eventdrivenmethod for defining normal and anomalous networkbehavior, and require the outline for anoriginal flow-basednetwork model diretedat anomaly detetion. ISSN: 2231-2803 http://www.ijttjournal.org Page187
II. RELATED WORK Several supervised algorithms have been applied to by taking into aount various network appliations and situations. In [4]Moore and Zuevpresented a Naive Bayes estimator to lassify by appliation. Individually, urrent work apitalizes on hand-lassified network data, utilizing it as input to a supervised Naive Bayes estimator.theahievable of highlevel aurayis illustrated with the Naive Bayes estimator. In[6] Kim et al.presented ports-based Corel Reef method whih host seven ommon statistial feature based methods and behaviour-based BLINC method using supervised algorithms on seven dissimilar traes.this studygenerated several insights: (a) Theeffiieny of portbased in reognizing legay appliations is quiet impressive, moresupported by the use of paket size and TCP flag evidene. This detaillarifies why researh onsideration has shifted to notiing and lassifying new appliations that use port amouflaged and enryption, i.e., intentionally trying to avoid. Inappropriately, growing attention to lassifying for determinations not essentiallyaepted by originator of the is likely to inrease this group of, persuading an arms rae between those demanding to lassify, and those demanding to avoid having their lassified. In [7]Lorieret al. presented grouping of flows into a lesser number of lusters using the expetation maximization (EM) algorithm and physially label eah luster to an laim. The lusters are funtionaland the lustering and algorithms speify that a worthyfit has been attained to the data. Firstexaminationspeifies that the lusters are steady over a variety of different data with the similar overall features. The prevailing lusters offeranother way to disaggregate a paket header stream and assume it to demonstrate useful in analysis those emphases on a speifi type. For instane, simulation of TCP optimisations for high performane bulk transfer. But, additional work is essential to entirely meet our originalobjetive of lustering into groups that a network manager would distinguish as related to the speifi appliation types on their network. In [8] Bernaille et al.presented thek-means algorithm to lustering of s and labelledthe lusters to appliations by means of a payload analysis tool. The early results deteted with the method on a small trae are promising. The method is apable as it letsprimary of appliations and is fairlymodest. Still, the method has ertainlimitations that have been disussed below. Most of these limitations are easy to overwhelmed, while others are more important and affet most methods to date. In Multi-homed networks, large networks frequently have numerous onnetions to the Internet. In this ase, this approahan be extended todisplay all aess links and aggregate information on a mahine where the will our. In [9] Ermanet al.presented unidiretional statistial features to simplify in the network ore. In this the problem of is onsidered in the network ore. Classifiation at the ore is provoking beause only inompleteinformation about the flows and their ontributors is available. This problem an be addressed by developing aoutline that an lassify a flow using only unidiretional flow information. This approah ould be evaluated using latest paket traes that olleted and prelassified to establish a base truth. III. PROPOSED WORK In proposed system, Flow-level anomaly detetion method an be used with the framework of Unknown Flow Detetion approahes. The method of syntheti generation of flow-level traes (SG FLT) has been disussed for detetion of Flow level anomaly in network. In this method, normal and anomaly models of flows an be examined. In order to find the normal and anomaly flow, an event driven approah has been defined. A. SYSTEM ARCHITECTURE Initially a system model in figure 1 has been developed for finding the unknown flow in a network based on flow orrelation. The anomaly detetion of the flows an be deteted using syntheti generation of flow-level traes whih is based on event driven approah. Then the deteted unknown and anomaly flows an be lustered by means of k- means algorithm. At the training phase, less number of labelled flows and a large number of unlabelled flows are united to omprise an unsupervised training data set for lustering. The haraterized flows are utilized to train a lassifier namely nearest neighbour. In the testing stage, a ompound on the orrelated flows has been performed in preferene to lassifying individual flows. Comparative analysis an be done for the proposed system by using real time datasets. Flow-label propagation Labelled Unlabelled Anomaly flow Training Testing Syntheti generation of flow-level traes Clustering and Compound Analysis Figure 1: Proposed system ahiteture ISSN: 2231-2803 http://www.ijttjournal.org Page188
IV. METHODOLOGY A. FLOW LABEL PROPAGATION The proposed method aims to lassify flows based on the flow level statistial properties. A flow onsists of suessive IP pakets having the same 5-tuple: {soure ip, soure port, destination ip, destination port, transport protool}. Traffi flows are onstruted by inspeting the headers of IP pakets aptured by the system on a omputer network. We start with a small set of pre-labelled flows to reate a supervised data set for luster-appliation mapping and a training set for lustering. Suppose the labelled flow set is A = {xa1, xa2,...} with the labels La = {ya1, ya2,...}, where eah flow is a real vetor in the statistial feature spae and the dimension of the vetor is determined by the number of flow statistial properties a large set of unlabelled flows = {xb1, xb2,...}, in the target network. Then anomaly flow an be defined as C ={ x1,x2, } in the target network.. By merging the labelled, unlabelled flow sets and anomaly flows the training set T for lustering an be obtained as follows T= A B C Moreover, an automati proess is applied to extend the labelled flow set by searhing the orrelated flows between A,B and C For eah flow xa in A, the automati proess searhes for its orrelated flows in B and C with the same 3-tuple: {destination ip, destination port, transport protool. The following algorithm provides the propagation of flows based on 3-tuple based heuristi as follows Input: Small flow set A and its orresponding label set La; large unlabelled flow set B; anomaly flow set C Output: extended set of labeled flows E and its orresponding label set Le 1. Create output flow set E A 2. Create output label set L L 3. for i 1 to A do 4 for j 1 to B && C do 5. Chek and ompare 3-tuple of x,x and x 6. if x ai, x bj and x share same 3-tuple then 7. Put x bj and x into E; 8. Put y ai into L e // y ai is determined as the label of x bj 9. End 10. End 11. End B. SYNTHETIC GENERATION OF FLOW-LEVEL TRAFFIC TRACES(SG FLT) A normal and anomalous network behavior is defined; the subsequently step is to loate appropriate models that give a realisti report of normal and anomalous. General parameter in modeling defines that eah model should be planned for a speifi idea; in present ase of work this is generating standard traes for flow-level anomaly detetion. Traffi Model Timesale: Anomaly detetion systems of Timesale works at ranges from more than a few days to minutes. Therefore, the present model must be aomplished of relating the long-timesale behavior in addition to the short-timesale behavior of network. Modelling Charateristis of Flows: Flow traes generation involves modeling of diverse flow parameters whih are regularly used for detetion of anomaly: Parameters used by volume metris suh as pakets and bytes, and Parameters used by spatial metris namely soure and destination addresses and ports. Versatile and Realisti anomaly models: In order to generate benhmark traes, the model must be able to generating anomalies of hanging intensities, and it needs to onsider the ollision of an anomaly on network in normal ondition. C. NEAREST CLUSTER BASED CLASSIFIER Another In this k-means lustering has been used to onstrut nearest luster based lassifier. The aim of K-means lustering is to partition the flows into k-lusters in order to redue the luster- sum as follows: argmin x m (1) where m i denotes the entroid of C i and it is the mean of flows in C i. Consider a initial set of k entroids whih are seleted randomly {m, m,. m }.The lustering algorithm an be done by alternating the assigning and updating stages. During assigning phase, eah luster is assigned to the losest mean of luster. C ={x : x m x m for all l=1, k} (2) While in update stage, the entroid of the flows in the luster an be hosen by alulating the new means as follows: m = x (3) The output of the luster-lass mapping is utilized to build a lassifier for eah testing flows. Consider the lasses formed by the luster-lass mapping are denoted by ψ = {ω 1... ω q }. The lasses are presented by using the results of k-means lustering and the flow statistial features. For lass ω i, it an be illustrated by a set of luster entroids, M ={m :C εω } The rule for individual flow is as follows: y= argmin min j x m (4) D. COMPOUND CLASSIFICATION In the phase, the ompound is applied on the orrelated flows modelled instead of lassifying individual flows. Consider the flows are X={x,. x }.In order to improve the auray,the orrelated information an be used. The ompound has been developed by aggregation of flow preditions of the lassifier with the weighted measures. For a given flows X={x,. x }, g flow preditions arey, y The flow preditions an be straightforwardly transformed into weighted measures ISSN: 2231-2803 http://www.ijttjournal.org Page189
v = 1 International Journal of Computer Trends and Tehnology (IJCTT) volume 10 number 4 Apr 2014 if y indiates the i th lass 0, Otherwise (5) The ompound deision rule using the weighted measures is as follows: Assign X ω if v =max v (6) It proves that all flows in X are lassified into ω l. V. EXPERIMENTAL RESULTS In this setion, a number of experiments have been arried out to study the impat of unknown appliations and anomaly flows to the supervised methods. Then, the proposed method is ompared with two methods suh as when unknown appliations are measured in the experiments and a using syntheti generation of flow-level traes (SG -FLT).` Two ommon metris are used to measure the performane in anomaly with un known appliation senario and unknown appliation framework. Overall auray is defined as the ratio of the amount of all orretly lassified flows to the sum of all testing flows Auray = F-measure is alulated by F-measure = Where preision is defined as the ratio of orretly lassified flows over all predited flows in a lass (unknown +anomaly flows) and reall is defined as the ratio of orretly lassified flows over all ground truth flows in a speified lass. The omparative graph for the of unknown flows and of anomaly with unknown lass is illustrated below: A u r a y 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 Figure 2: Auray omparision graph with unknown appliation using SG -FLT 1000 2000 3000 4000 Unlabelled and anomaly flow of training F m e a s u r e 0.86 0.84 0.82 0.8 0.78 0.76 0.74 Methods Figure 3: F-measure omparision graph Thus the above graph in figure 2 and 3 shows that proposed system of using syntheti generation of flow-level traes (SG -FLT) provides higher auray and F-measure when ompared with existing method of private with unknown appliation. VI. CONCLUSION In the present work, in network has been done by based on detetion of flow-level anomaly detetion by using the syntheti trae generation approah. Syntheti trae generation approah has been proposed in whih it has the potential of making standard traes available to a broad ommunity. A referene model is reated whih allows defining the normal and anomalous network behavior. The present method introdued tehniques to adequately utilize flow orrelation information. Firstly flow label propagation is utilized whih an automatially aurately label more unlabelled flows with anomaly flows to enhane the ability of nearest luster based lassifier (NCC). Another method is ompound whih an unite a number of flow preditions to make more aurate of weighted flows. Experimental result in terms of auray and F-measure provides better result when ompare with the existing system. REFERENCES with unknown appliation using SG -FLT [1]. T. Karagiannis, K. Papagiannaki, and M. Faloutsos, BLINC: multilevel in the dark, SIGCOMM Comput. Commun. Rev.,vol. 35, pp. 229 240, Aug. 2005. [2]. Y. Xiang, W. Zhou, and M. Guo, Flexible deterministi paket marking:an IP traebak system to find the real soure of attaks, IEEE Trans.Parallel Distrib. Syst., vol. 20, no. 4, pp. 567 580, Apr. 2009. [3]. M. Roughan, S. Sen, O. Spatshek, and N. Duffield, Class-ofserviemapping for QoS: a statistial signature-based approah to IP, in Pro. 2004 ACM SIGCOMM Conferene on InternetMeasurement, pp. 135 148. [4]. A. W. Moore and D. Zuev, Internet using Bayesiananalysis tehniques, SIGMETRICS Perform. Eval. Rev., vol. 33, pp. 50 60, June 2005 ISSN: 2231-2803 http://www.ijttjournal.org Page190
[5]. A. MGregor, M. Hall, P. Lorier, and J. Brunskill, Flow lusteringusing mahine learning tehniques, in Pro. 2004 Passive and AtiveMeasurement Workshop, pp. 205 214. [6]. H. Kim, K. Claffy, M. Fomenkov, D. Barman, M. Faloutsos, and K. Lee, Internet demystified: myths, aveats, and the bestpraties, in Pro. 2008 ACM CoNEXT Conferene, pp. 1 12. [7]. Lorier, MGregor, M. Hall, P., and J. Brunskill, Flow lusteringusing mahine learning tehniques, in Pro. 2004 Passive and AtiveMeasurement Workshop, pp. 205 214. [8]. L. Bernaille, R. Teixeira, I. Akodkenou, A. Soule, and K. Salamatian, Traffi on the fly, SIGCOMM Comput. Commun. Rev.,vol. 36, pp. 23 26, Apr. 2006. [9]. J. Erman, A. Mahanti, M. Arlitt, and C. Williamson, Identifying anddisriminating between web and peer-to-peer in the network ore, in Pro. 2007 International Conferene on World Wide Web, pp. 883 892... ISSN: 2231-2803 http://www.ijttjournal.org Page191