Botnets Detection Based on IRC-Community



Similar documents
An Anomaly-based Botnet Detection Approach for Identifying Stealthy Botnets

Applying Multiple Neural Networks on Large Scale Data

Analyzing Spatiotemporal Characteristics of Education Network Traffic with Flexible Multiscale Entropy

An Innovate Dynamic Load Balancing Algorithm Based on Task

Extending Black Domain Name List by Using Co-occurrence Relation between DNS queries

Implementation of Botcatch for Identifying Bot Infected Hosts

Software Quality Characteristics Tested For Mobile Application Development

PERFORMANCE METRICS FOR THE IT SERVICES PORTFOLIO

The Research of Measuring Approach and Energy Efficiency for Hadoop Periodic Jobs

Symptoms Based Detection and Removal of Bot Processes

BotCop: An Online Botnet Traffic Classifier

A Review on IRC Botnet Detection and Defence

An Approach to Combating Free-riding in Peer-to-Peer Networks

Online Bagging and Boosting

Real Time Target Tracking with Binary Sensor Networks and Parallel Computing

Performance Evaluation of Machine Learning Techniques using Software Cost Drivers

An Integrated Approach for Monitoring Service Level Parameters of Software-Defined Networking

ASIC Design Project Management Supported by Multi Agent Simulation

P2P-BDS: Peer-2-Peer Botnet Detection System

Data Streaming Algorithms for Estimating Entropy of Network Traffic

Research Article Performance Evaluation of Human Resource Outsourcing in Food Processing Enterprises

Botnet Detection Based on Degree Distributions of Node Using Data Mining Scheme

An Improved Decision-making Model of Human Resource Outsourcing Based on Internet Collaboration

Exploiting Hardware Heterogeneity within the Same Instance Type of Amazon EC2

Generating Certification Authority Authenticated Public Keys in Ad Hoc Networks

Efficient Key Management for Secure Group Communications with Bursty Behavior

Media Adaptation Framework in Biofeedback System for Stroke Patient Rehabilitation

Extended-Horizon Analysis of Pressure Sensitivities for Leak Detection in Water Distribution Networks: Application to the Barcelona Network

An improved TF-IDF approach for text classification *

Searching strategy for multi-target discovery in wireless networks

Study on the development of statistical data on the European security technological and industrial base

CRM FACTORS ASSESSMENT USING ANALYTIC HIERARCHY PROCESS

Image restoration for a rectangular poor-pixels detector

INTEGRATED ENVIRONMENT FOR STORING AND HANDLING INFORMATION IN TASKS OF INDUCTIVE MODELLING FOR BUSINESS INTELLIGENCE SYSTEMS

Protecting Small Keys in Authentication Protocols for Wireless Sensor Networks

Managing Complex Network Operation with Predictive Analytics

Fuzzy Sets in HR Management

BotHunter: Detecting Malware Infection Through IDS-Driven Dialog Correlation

Information Processing Letters

Detection of Botnets Using Honeypots and P2P Botnets

Markov Models and Their Use for Calculations of Important Traffic Parameters of Contact Center

ADJUSTING FOR QUALITY CHANGE

An Efficient Methodology for Detecting Spam Using Spot System

A framework for performance monitoring, load balancing, adaptive timeouts and quality of service in digital libraries

Use of extrapolation to forecast the working capital in the mechanical engineering companies

BOTNET Detection Approach by DNS Behavior and Clustering Analysis

Local Area Network Management

Botnet Detection Based on Traffic Monitoring

Preference-based Search and Multi-criteria Optimization

AUC Optimization vs. Error Rate Minimization

REQUIREMENTS FOR A COMPUTER SCIENCE CURRICULUM EMPHASIZING INFORMATION TECHNOLOGY SUBJECT AREA: CURRICULUM ISSUES

Leak detection in open water channels

Fuzzy Evaluation on Network Security Based on the New Algorithm of Membership Degree Transformation M(1,2,3)

AutoHelp. An 'Intelligent' Case-Based Help Desk Providing. Web-Based Support for EOSDIS Customers. A Concept and Proof-of-Concept Implementation

CLASSIFYING NETWORK TRAFFIC IN THE BIG DATA ERA

Energy Proportionality for Disk Storage Using Replication

Machine Learning Applications in Grid Computing

International Journal of Management & Information Systems First Quarter 2012 Volume 16, Number 1

Adaptive Modulation and Coding for Unmanned Aerial Vehicle (UAV) Radio Channel

An apparatus for P2P classification in Netflow traces

Botnet Detection Based on Degree Distributions of Node Using Data Mining Scheme

Evaluating the Effectiveness of Task Overlapping as a Risk Response Strategy in Engineering Projects

Quality evaluation of the model-based forecasts of implied volatility index

Multifaceted Approach to Understanding the Botnet Phenomenon

This paper studies a rental firm that offers reusable products to price- and quality-of-service sensitive

Botnet Detection by Abnormal IRC Traffic Analysis

A Study on the Chain Restaurants Dynamic Negotiation Games of the Optimization of Joint Procurement of Food Materials

Evaluating Inventory Management Performance: a Preliminary Desk-Simulation Study Based on IOC Model

Botnet Detection using NetFlow and Clustering

Standards and Protocols for the Collection and Dissemination of Graduating Student Initial Career Outcomes Information For Undergraduates

Design of Model Reference Self Tuning Mechanism for PID like Fuzzy Controller

How To Balance Over Redundant Wireless Sensor Networks Based On Diffluent

Factored Models for Probabilistic Modal Logic

Resource Allocation in Wireless Networks with Multiple Relays

Online Classification of Network Flows

Equivalent Tapped Delay Line Channel Responses with Reduced Taps

Optimal Resource-Constraint Project Scheduling with Overlapping Modes

Agenda. Taxonomy of Botnet Threats. Background. Summary. Background. Taxonomy. Trend Micro Inc. Presented by Tushar Ranka

Red Hat Enterprise Linux: Creating a Scalable Open Source Storage Infrastructure

Detecting P2P-Controlled Bots on the Host

An Anomaly-Based Method for DDoS Attacks Detection using RBF Neural Networks

A Novel Distributed Denial of Service (DDoS) Attacks Discriminating Detection in Flash Crowds

A decision model for evaluating third-party logistics providers using fuzzy analytic hierarchy process

AN ALGORITHM FOR REDUCING THE DIMENSION AND SIZE OF A SAMPLE FOR DATA EXPLORATION PROCEDURES

Transcription:

Botnets Detection Based on IRC-Counity Wei Lu and Ali A. Ghorbani Network Security Laboratory, Faculty of Coputer Science University of New Brunswick, Fredericton, NB E3B 5A3, Canada {wlu, ghorbani}@unb.ca Botnets are networks of coproised coputers controlled under a coon coand and control (C&C) channel. Recognized as one the ost serious security threats on current Internet infrastructure, botnets are often hidden in existing applications, e.g. IRC, HTTP, or Peer-to-Peer, which akes the botnet detection a challenging proble. Previous attepts for detecting botnets are to exaine traffic content for IRC coand on selected network links or by setting up honeypots. In this paper, we propose a new approach for detecting and characterizing botnets on a large-scale WiFi ISP network, in which we first classify the network traffic into different applications by using payload signatures and a novel clustering algorith and then analyze the specific IRC application counity based on the teporal-frequent characteristics of flows that leads the differentiation of alicious IRC channels created by bots fro noral IRC traffic generated by huan beings. We evaluate our approach with over 60 illion flows collected over five consecutive days on a large scale network and results show the proposed approach successfully detects the botnet flows fro over 60 illion flows with a high detection rate and an acceptable low false alar rate. O I. INTRODUCTION ne of the biggest threats to the current Internet infrastructure is botnets which are usually coprised of large pools of coproised coputers under the control of a botaster. Botnets can be centralized, distributed or peer-topeer (PP) according to different coand and control (C&C) odels and different counication protocols (e.g. HTTP, IRC or PP). The attacks conducted by botnets are very different, ranging fro Distributed Denial-of-Service (DDoS) attacks to e-ail spaing, keylogging, click fraud, and new alware spreading. In Figure, we illustrate a typical lifecycle of a botnet and its attacking behaviours. Botaster.exploit.bot download 7.coand 4.join 5.pass authen. victi server Botnet 6.pass vulnerable host IRC server Fig.. Typical life-cycle of a IRC based botnet and its attacking behaviors 7.coand 8.DDOS DNS server 3.DNS query The botaster usually finds a new bot by exploiting its vulnerabilities reotely. Once affected, the bot will download and install the binary code by itself. After that, each bot on the botnet will attept to find the IRC server address by DNS query, which is illustrated in Step 3 of Figure. Next is the counication step between bots and IRC server. In IRC based counication echanis, a bot first sends a PASS essage to the IRC server to start a session and then the server authenticates the bot by checking its password. In any cases, the botaster also needs to authenticate itself to the IRC server. Upon the copletion of these authentications, the coand and control channels aong botaster, bots, and IRC server will be established. To start a DDoS attack, the botaster only needs to send a siple coand like ".ddos.start victi_ip" while all bots receive this coand and start to attack the victi server. This is shown in Step 8 of Figure. More inforation about the botaster coand library can be found in []. Detecting botnets traffic is a very challenging proble. This is because: () botnets use the existing application protocol, and thus their traffic volue is not that big and is very siilar to the noral traffic behaviour; () classifying traffic applications becoes ore challenging due to the traffic content encryption and the unreliable destination port labelling ethod. Previous attepts on detecting botnets are ainly based on honeypots [,3,4,5,6], passive anoaly analysis [7,8,9] and traffic application classification [0,,]. Setting up and installing honeypots on the Internet is very helpful to capture alwares and understand the basic behaviours of botnets. The passive anoaly analysis for detecting botnets on a network traffic is usually independent of the traffic content and has the potential to find different types of botnets (e.g. HTTP based botnet, IRC based botnet or PP based botnet). The traffic application classification based botnets detection focuses on classifying traffic into IRC traffic and non-irc traffic, and thus it can only detect IRC based botnets, which is the biggest liitation when copared with the anoaly based botnets detection. In this paper, we focus on traffic classification based botnets detection. Instead of labeling and filtering traffic into non-irc and IRC, we propose a generic approach to classify traffic into different application counities (e.g. PP, Chat, Web, etc.). Then, based on each specific application counity, we investigate and apply the teporal-frequent characteristics of network flows to differentiate the alicious botnet behaviors fro the noral application traffic. The ajor contributions of this paper include: () a novel This full text paper was peer reviewed at the direction of IEEE Counications Society subject atter experts for publication in the IEEE "GLOBECOM" 008 proceedings. 978--444-34-8/08/$5.00 008 IEEE. Authorized licensed use liited to: University of New Brunswick. Downloaded on May 8, 009 at : fro IEEE Xplore. Restrictions apply.

application discovery approach for classifying network applications in a large-scale WiFi ISP network, () a new algorith to discriinate botnets IRC fro the noral IRC traffic, which is based on n-gra (frequent characteristics) of flow payload over a tie period (teporal characteristics), and (3) a botnet detection fraework for detecting any types of botnets. The rest of the paper is organized as follows. Section II presents our application classification approach for network flows. Section III is the botnet detection algorith based on the teporal-frequent characteristics of botnets. Section IV is the experiental evaluation for our detection odel with over 60 illion flows collected on a large-scale WiFi ISP network. Finally, soe concluding rearks and future work are given in Section V. II. TRAFFIC APPLICATION CLASSIFICATION Identifying network traffic into different applications is very challenging and is still an issue yet to be solved. In practice, traffic application classification relies to a large extent on the transport layer port nubers, which was an effective way in the early days of the Internet. Port nubers, however, provide very liited inforation nowadays. An alternative way is to exaine the payload of network flows and then create signatures for each application. This, however, generates two ajor liitations: () legal issues related to privacy, and () it is ipossible to identify encrypted traffic. By observing traffic on a large-scale WiFi ISP network, we found that even exploring the flow content exaination ethod, there are still about 40% network flows that cannot be classified into specific applications (i.e. 40% network flows are labeled as unknown applications). Investigating such a huge nuber of unknown traffic is inevitable since they ight stand for the abnoralities in the traffic, alicious behaviors or siply the identification of novel applications. Next we first discuss the payload signatures based classification approach and then present the cross association clustering algorith for classifying the unknown traffic into different known application counities. A. Payload Signatures Based Classification The payload signatures based classifier is to investigate the characteristics of bit strings in the packet payload. For ost applications, their initial protocol handshake steps are usually different and thus can be used for classification. Moreover, the protocol signatures can be odeled through either public docuents like RFC or epirical analysis for deriving the distinct bit strings on both TCP and UDP traffic. The classifier is deployed on a large-scale free wireless fidelity (WiFi) network and the classification results show that about 40% flows cannot be classified by the current application payload signatures based classification ethod. Next, we present a fuzzy cross association clustering algorith in order to address this issue. B. Unknown Traffic Classification The traditional port-based classification ethod is proven to be isleading due to the increase of applications tunneled through HTTP, the constant eergence of new protocols and the doination of PP networking [3]. Exaining the payload signatures of applications iproves the classification accuracy, but still a large nuber of traffic cannot be identified. Recent studies on application classification include "applying achine learning algoriths for clustering and classifying traffic flows" [4], "statistical fingerprint based classification" [5] and "identifying traffic on the fly" [6]. Different with the previous approaches, our ethod is hybrid, cobining the payload signatures with a novel cross association clustering algorith [7]. The payload signatures classify traffic into predefined known application counities. The unknown traffic is then assigned into different application counities with a set of probabilities by using a clustering algorith. Those unknown traffic that cannot be classified into any known application counity will be considered as new or unknown applications. The basic idea of applying cross association algorith is to study the association relationship between known traffic and unknown traffic. In nuerous data ining applications, a large and sparse binary atrix is used to represent the association between two objects (corresponding to rows and coluns). Cross associations are then defined as a set of rectangular regions with different densities. The clustering goal is to suarize the underlying structure of object associations by decoposing the binary atrix into disjoint row and colun groups such that the rectangular intersections of groups are hoogeneous with high or low densities. Previous association clustering algoriths need to predefine the nuber of clusters (i.e. rectangles). This, however, is not realistic in our unknown traffic classification because the actual nuber of applications is unknown. The basis of our unknown traffic classification ethodology is a novel cross association clustering algorith that can fully estiate the nuber of rows and coluns autoatically [7]. During classification, the traffic consists of unknown and known flows are clustered in ters of the source IP and the destination IP. A set of rectangles is generated after this stage. We define these rectangles as counities including either a set of flows or epty. Then flows in each counity are clustered in ters of destination IP and destination port. Siilarly, one counity will be decoposed into several sub-counities, each represents an application counity. After all flows are classified into different application counities, we have to label each application counity. A siple and effective way is to label each application counity based on its content. In particular, we calculate the nuber of flows for each known application in the counity and noralize the nubers into a set of probabilities ranging fro 0 to. The unknown flows in each application will be assigned into a specific application according to a set of probabilities. This idea is siilar with the eber function in fuzzy clustering algorith and the experiental evaluation proves its accuracy and efficiency. An exception for this labeling ethod is if the doinant flow in the counity is the unknown flow, the whole counity will be labeled as This full text paper was peer reviewed at the direction of IEEE Counications Society subject atter experts for publication in the IEEE "GLOBECOM" 008 proceedings. 978--444-34-8/08/$5.00 008 IEEE. Authorized licensed use liited to: University of New Brunswick. Downloaded on May 8, 009 at : fro IEEE Xplore. Restrictions apply.

0. 0.09 0.08 0.07 0.06 0.05 0.04 0.03 0.0 0.0 0 0 50 00 50 00 50 300 Index of ASCII Characters "unknown", which has the potential of discovering new or unknown applications. III. BOTNET DETECTION BASED ON IRC COMMUNITY A general ai for intrusion detection is to find various attack types by odeling signatures of known intrusions (isuse detection) or profiles of noral behaviors (anoaly detection). Botnet detection, however, is ore specific due to a given application doain. N-gra bytes distribution has proven its efficiency on detecting network anoalies. In [8] Wang et al. exained -gra byte distribution of the packet payload, represented each packet into a -dienational vector describing the occurrence frequency of one of the ASCII characters in the payload and then constructed the noral packet profile through calculating the statistical average and deviation value of noral packets to a specific application service (e.g. HTTP). Anoalies will be alerted once a Mahalanobis distance deviation of the testing data to the noral profiles exceeds a predefined threshold. Gu et al. iprove this approach and apply it for detecting alware infection in their recent work [9]. Different with previous n-gra based detection approaches, our ethod extends n-gra frequency into a teporal doain and generates a set of -dientional vector representing the teporal-frequent characteristics of the ASCII binary bytes on the payload over a predefined tie interval. The teporal feature is iportant in botnets detection due to two epirical observations of botnets behaviors: () the response tie of bots is usually iediate and accurate once they receive coands fro botaster, while noral huan behaviors ight perfor an action with various possibilities after a reasonable thinking tie, and () bots basically have preprograed activities based on botaster s coands, and thus all bots ight be synchronized with each other. After obtaining the n-gra (n = in this case) features for flows over a tie-window, we then apply K-eans algorith to cluster the data objects with -deensional features. We don t construct the noral profiles because noral traffic is sensitive to the practical networking environent and a high false positive rate ight be generated when deploying the training odel on a new environent. In contrast, K-eans clustering is unsupervised and doesn t define threshold that needs to be tuned in different cases. In our approach, the nuber of initial clusters by K-eans is. We denote the -diensional n-gra byte distribution as a vector ti ti t < f i >, where t f i stands for the frequency of j the j th ASCII character on the payload over a tie window t i (j=,,, and i=0, ). Given a set of N data objects F ~ {F i i=,,,n}, where ti ti t =< i >, the detection approach Fi f is described in Algorith I. In practice, labeling the cluster is always a challenging proble when applying unsupervised algorith for intrusion detection. By observing the noral IRC traffic over a long period on a large scale WiFi ISP network and the IRC botnet traffic collected on a honeypot, we derive a new etric, standard deviation σ for each cluster, to differentiate botnet IRC cluster fro noral IRC clusters. The higher the value of average σ over ACSII characters for flows on a cluster, the ore noral the cluster is. This is reasonable because during noral IRC traffic, huan being s behaviors is ore diverse with various possibilities copared to the alicious IRC traffic generated by bots. Given the frequency vectors for n flows as follows: { < f, f,..., f >, < f > < f n n n > } Suppose σ j is the standard deviation of the j th ASCII over n flows, the average standard deviation σ over ACSII characters for flows can be calculated by the following forula: Average Bytes Frequecny over Noral IRC σi σ= i= ALGORITHM I BOTNET DETECTION Function BotDel (F) returns botnet cluster t Inputs: Collection of data objects i ti t F f i i =< >, i =,,..N Initialization: initialize nuber of clusters k ( e.g. k = ), cluster centers c, k Repeat: q q + Assign data objects to clusters by deterining the closest cluster center points. Calculate the new center point c newfor each cluster. Until: c new c< th or q > th Calculate standard deviation for each cluster : σ, σ,..., σ σ = ax( σ, σ,..., σ ) then cluster b is labeled as botnet cluster If b Return the botnet cluster σ b. Average Bytes Frequency over IRC Botnet 0.06 0.05 0.04 0.03 0.0 0.0 0 0 50 00 50 00 50 300 Index of ASCII Characters Fig. Average bytes frequency over Fig. 3 Average bytes frequency over ASCIIs for noral IRC flows ASCIIs for botnet IRC As an exaple, Figures and 3 illustrate the average bytes frequency over the noral IRC flows and IRC botnet flows, respectively. The average standard deviation of bytes frequency over ASCII characters for noral IRC traffic is 0.00 and the axial standard deviation of bytes frequency over ASCII characters for noral IRC traffic is 0.05, while the average standard deviation of bytes frequency over ASCII characters for IRC botnet traffic is 0.0009 and its axiu is 0.0, which is uch saller than that of noral IRC traffic. This observation confirs that the noral huan This full text paper was peer reviewed at the direction of IEEE Counications Society subject atter experts for publication in the IEEE "GLOBECOM" 008 proceedings. 978--444-34-8/08/$5.00 008 IEEE. 3 Authorized licensed use liited to: University of New Brunswick. Downloaded on May 8, 009 at : fro IEEE Xplore. Restrictions apply.

being s IRC traffic is ore diverse than the alicious IRC traffic generated by bots. IV. EXPERIMENTAL EVALUATION We ipleent a prototype syste for the approach and then evaluate it on a large-scale WiFi ISP network over five consecutive business days. The botnet IRC traffic is collected on a honeypot deployed on a real network and is then aggregated into 43 flows. The tie interval for flow aggregation is inute. When evaluating the prototype syste, we randoly insert and replay botnet traffic flows on the noral daily traffic. Since our approach is a two-stage process (i.e. unknown traffic classification first and botnet detection on IRC application counity next), the evaluation is accordingly divided into two parts: () the perforance testing for unknown traffic classification, not only focusing on the capability of our approach to classify the unknown IRC traffic, we also concentrate on the classification accuracy for other unknown applications (e.g. new PP) since we expect the algorith could be extended to detect various types of botnet, like Web based and PP based botnets; () the perforance evaluation for syste to discriinate alicious IRC bonnet traffic fro noral huan being IRC traffic. A. Evaluation on Unknown Traffic Classification Evaluating the unknown traffic classification capability is not an easy task in reality since we have no idea on the novel or recent appeared applications and it always needs the intervention of network experts. During our experient, we randoly choose part of known traffic and then force to label the as unknown. The selection for the nuber of all these label free traffic is decided according to the 40% rule. The final unknown traffic set is coposed by the forcibly labeled known traffic and the 43 botnet IRC flows. Over five days evaluation, we found that all the botnet flows can be accurately classified into the IRC application counity (i.e. 00% classification rate for IRC traffic). However, the general classification accuracy over all applications is about 85% which is not that high copared to the specific IRC application. The general classification accuracy is an average value over all application classification since the approach has different classification rate for different application counities. Table I is a description about known application set and the unknown application set over one hour, e.g. how any known applications the flows belong to, etc. B. Evaluation on Discriinating Botnet fro Noral IRC The proposed approach is evaluated with five full consecutive days traffic. Table II shows the flow distribution for IRC application counity and the total flow counity for each day after the traffic classification step. Two etrics are used to evaluate the perforance of discriinating botnet traffic fro noral IRC traffic, naely Detection Rate (DR) and False Alar Rate (FAR). DR is the ratio of nuber of botnet flows detected over total nuber of botnet flows and FAR is the ratio of nuber of false botnet alars over the total nuber of alars. TABLE I DESCRIPTION ON KNOWN AND UNKNOWN SET OVER ONE HOUR Known et Unknown et Nuber Nuber of Nuber Nuber of of Applications of Applications 76484 38 39408 TABLE II DESCRIPTION ON IRC COMMUNITIES OVER FIVE DAYS Total Known Total IRC Known IRC 35409K 374K 606 363 9538K 833K 569 36 3 357K 574K 53 0 4 3693K 0596K 64 5 3375K 096K 87 44 Table III lists the DR and FAR for all the five days detection and accordingly Table IV lists the average standard deviation over the characters of the payload collected on the network for each cluster. TABLE III DETECTION PERFORMANCE OVER FIVE DAYS Perforance Metrics DR (%) FAR (%) 00.0 8.9 00.0 6.8 3 77.8 3. 4 00.0.6 5 00.0 5.0 TABLE IV STANDARD DEVIATION OF BYTES FREQUENCY OVER ASCIIS FOR NORMAL AND BOTNET CLUSTERS Average Standard Noral Clusters Botnet Clusters 0.005 0.0005 0.009 0.007 3 0.005 0.0006 4 0.003 0.0005 5 0.005 0.0006 Fro Table II, we see that the total nuber of flows we collect for one day is over 30M and the total nuber of known flows which can be labeled by the payload signatures is over 0M. The nuber of IRC flows over the five consecutive day is fro 00 to 600, which is a very sall part of the total flows. Our traffic classification approach can classify the unknown IRC flows to the IRC application counity with a 00% classification rate on the five days evaluation. The detection rate for differentiating bot IRC traffic fro noral huan being s IRC traffic is 00% on four days testing, while an exception happens on the 3 rd day s testing on which our approach obtained a 77.8% detection rate with a 3.% false alar rate. The best evaluation over the five days testing is a This full text paper was peer reviewed at the direction of IEEE Counications Society subject atter experts for publication in the IEEE "GLOBECOM" 008 proceedings. 978--444-34-8/08/$5.00 008 IEEE. 4 Authorized licensed use liited to: University of New Brunswick. Downloaded on May 8, 009 at : fro IEEE Xplore. Restrictions apply.

00% detection rate with only.6% false alar rate. Moreover, evaluation results fro Table IV indicate that the average standard deviation of bytes frequency over the ASCIIs on the flow payload is an iportant etric to indicate noral huan IRC clusters and alicious IRC traffic generated by achine bots. V. CONCLUSION In this paper we attept to conduct a taxonoy on all existing botnet detection approaches and classify the into three categories, naely honeypots based, passive anoaly analysis based and traffic application classification based. As claied by Gu et al., anoaly based botnet detection approaches have the potential to find different types of botnets, while current existing traffic classification approaches only focus on differentiating alicious IRC traffic fro noral IRC traffic, which is considered as its biggest liitation. In this paper, we address this liitation by presenting a novel generic application classification approach. Through this unknown applications on the current network will be classified into different application counities, like Chat (or ore specific IRC) counity, PP counity, Web counity, etc. Since botnets are exploring existing application protocols, detection can be conducted in each specific counity. As a result, our approach can be extended to find different types of botnets. In particular, we evaluate our fraework on IRC counity in this paper and evaluation results show that our approach obtains a very high detection with a low false alar rate when detecting IRC botnet traffic. Especially we foralize the botnet behaviours by using an average standard deviation of bytes frequency over ASCIIs on the traffic payload, and conclude an iportant bot identification strategy, that is the higher the value of the average deviation, the ore huan being like the IRC traffic. This indication strategy is iportant when using unsupervised clustering algorith for botnet detection in the later research. In the near future, we will evaluate our approach on the web specific counity and test its perforance on web based botnets. Soe novel PP botnets construction ethods have been proposed and investigated in [], and as a result we will also conduct an evaluation for our approach with the new appeared PP botnets. ACKNOWLEDGMENT The authors graciously acknowledge the funding fro the Atlantic Canada Opportunity Agency (ACOA) through the Atlantic Innovation Fund (AIF) to Dr. Ghorbani. REFERENCES [] P. Barford and V. Yegneswaran, "An inside look at Botnets," Special Workshop on Malware Detection, Advances in Inforation Security, Springer Verlag, ISBN: 0-387-370-7, 006. [] The Honeynet Project & Research Alliance, "Know your eney: Tracking botnets, " http://www.honeynet.org, March 005. [3] M.A. Rajab, J. Zarfoss, F. Monrose, and A. Terzis, "A ultifaceted approach to understanding the botnet phenoenon, " Proceedings of the 6 th ACM SIGCOMM Conference on Internet easureent, pp. 4-5, October 006. [4] P. Baecher, M. Koetter, T. Holz, M. Dornseif, and F. Freiling, "The nepenthes platfor: an efficient approach to collect alware," Proceedings of Recent Advances in Intrusion Detection, LNCS 49, Springer-Verlag, 006, pp. 65 84, Haburg, Septeber 006. [5] V. Yegneswaran, P. Barford, and V. Paxson, "Using honeynets for internet situational awareness," Proceedings of the 4 th Workshop on Hot Topics in Networks, College Park, MD, Noveber 005. [6] Z.H. Li, A. Goyal, and Y. Chen, "Honeynet-based botnet scan traffic analysis," Botnet Detection: Countering the Largest Security Threat, in Series: Advances in Inforation Security, Vol. 36, W.K.Lee, C. Wang, D. Dagon, (Eds.), Springer, ISBN: 978-0-387-68766-7, 008. [7] G.F. Gu, J.J. Zhang, and W.K. Lee, "BotSniffer: detecting botnet coand and control channels in network traffic," Proceedings of the 5 th Annual Network and Distributed Syste Security Syposiu, San Diego, CA, February 008 [8] A. Karasaridis, B. Rexroad, and D. Hoeflin, "Wide-scale botnet detection and characterization," Proceedings of the st Conference on st Workshop on Hot Topics in Understanding Botnets, Cabridge, MA, 007. [9] J. R. Binkley and S. Singh, "An algorith for anoaly-based botnet detection," USENIX SRUTI: nd Workshop on Steps to Reducing Unwanted Traffic on the Internet, July 006. [0] W. T. Strayer, R. Walsh, and C. Livadas, D. Lapsley, "Detecting botnets with tight coand and control," Proceedings 006 3 st IEEE Conference on Local Coputer Networks, pp. 95-0, Nov. 006. [] W. T. Strayer, D. Lapsley, R. Walsh, and C. Livadas, "Botnet Detection Based on Network Behavior," Botnet Detection: Countering the Largest Security Threat, in Series: Advances in Inforation Security, Vol. 36, W.K.Lee, C. Wang, D. Dagon, (Eds.), Springer, ISBN: 978-0-387-68766-7, 008. [] C. Livadas, R. Walsh, D. Lapsley, and W.T. Strayer, "Using achine learning techniques to identify botnet traffic," Proceedings 006 3 st IEEE Conference on Local Coputer Networks, pp. 967-974, Nov. 006. [3] A. W. Moore and K. Papagiannaki, "Toward the accurate identification of network applications," Proceedings of 6 th International Workshop on Passive and Active Network Measureent, pp. 4-54, Boston, MA, March 005. [4] N. Willias, S. Zander and G. Aritage, "A preliinary perforance coparison of five achine learning algoriths for practical IP traffic flow classification," ACM SIGCOMM Coputer Counication Review, Vol. 36, Issue 5, pp. 5-6, 006. [5] M. Crotti, M. Dusi, F. Gringoli and L. Salgarelli, "Traffic classification through siple statistical fingerprinting," ACM SIGCOMM Coputer Counication Review, Vol. 37, Issue, pp. 5-6, 007. [6] L. Bernaille, R. Teixeira, I. Akodkenou, A. Soule, and K. Salaatian, "Traffic classification on the fly," ACM SIGCOMM Coputer Counication Review, Vol. 36, Issue, pp. 3-6, 006. [7] D. Chakrabarti, S. Papadiitriou, D. Modha, and C. Faloutsos, "Fully Autoatic Cross-Associations," Proceedings of the 0 th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 79-88, Seattle, Washington, August -5, 004. [8] K. Wang and S. Stolfo, "Anoalous payload-based wor detection and signature generation," Proceedings of the 8 th International Syposiu on Recent Advances in Intrusion Detection (RAID), Seattle, WA, 005. [9] G. F. Gu, P. Porras, V. Yegneswaran, M. Fong, and W.K. Lee, "BotHunter: detecting alware infection through IDS-Driven dialog correlation," Proceedings of the 6 th USENIX Security Syposiu, Boston, MA, August 007. [0] P. Wang, S. Sparks, and C. Zou "An advanced hybrid peer-to-peer botnet," Proceedings of the st conference on st Workshop on Hot Topics in Understanding Botnets, Cabridge, MA, 007. [] C. Zou and R. Cunningha, "Honeypot-aware advanced botnet construction and aintenance," Proceedings of International Conference on Dependable Systes and Networks, June 006. This full text paper was peer reviewed at the direction of IEEE Counications Society subject atter experts for publication in the IEEE "GLOBECOM" 008 proceedings. 978--444-34-8/08/$5.00 008 IEEE. 5 Authorized licensed use liited to: University of New Brunswick. Downloaded on May 8, 009 at : fro IEEE Xplore. Restrictions apply.