Honey Bee Intelligent Model for Network Zero Day Attack Detection

Size: px
Start display at page:

Download "Honey Bee Intelligent Model for Network Zero Day Attack Detection"

Transcription

1 Honey Bee Intelligent Model for Network Zero Day Attack Detection 1 AMAN JANTAN, 2 ABDULGHANI ALI AHMED School of Computer Sciences, Universiti Sains Malaysia (USM), Penang, Malaysia 1 aman@cs.usm.my, 2 almohimid@yahoo.com Abstract This paper proposes an intelligent system for detecting zero day attacks based on honey-bee defence mechanism in nature. The proposed system consists of several units deployed in distributed locations in the network. These units perform a complementary task to monitor and detect network attacks including the zero day attacks. The system units recognize a network attack through two phases of investigation: Undesirable-Absent (UA) and Desirable-Present (DP) investigation phases. The mechanism of recognizing the attacks is achieved through investigating the absence of malicious features and presence of normal features. The UA method is used in the first phase of network investigation for matching the malicious behaviour based on absence of attacks signatures. The DP method is used in the next phase of unknown attack detection for matching the normal behaviour based on presence of normal traffic signatures. Neural network which trained by Back Propagation algorithm (BP) is used to learn the patterns of network attacks and also patterns of normal traffic. The performance of the proposed intelligent system is evaluated using KDD 99 dataset. The findings and obtained results demonstrate that the proposed defence technique is applicable and capable to detect network zero day attacks with maintaining a low rate of false alarms. Keywords: Honeybee approach, Intelligent system, Zero day attack, Neural networks. 1. Introduction The Internet has rapidly become a critical component of the national infrastructure, but its lack of effective security mechanisms creates a fertile environment for malicious users. Internet viruses, worms, and distributed denial of service (DDoS) attacks cause major disruptions and economic loss. More alarming are stealthy attacks aimed at stealing sensitive information and compromising vital systems. Trillions of dollars of transactions are daily done at each major financial institution. For instance, Visa processes 4,000 transactions per second, which means that if Visa s system goes down for one minute because of a DDoS attack, and assuming only $100 per transaction, over $24 million in transactions is lost in one minute. Statistical report indicated that the costs of malicious activities are unendurable. For instance, the conficker worm attacked the armed forces network in France and disrupted the flight of aircrafts at several airbases in The cleanup cost for worms, such as SoBig and Klez, amounted to USD 80 billion in In 2008, cyber criminals stole more than a trillion USD in intellectual property [1]. One of the main challenges in securing computer networks against the attacks mentioned above is the lack of ability for directly detecting the unknown attacks in a given network. Detecting the unknown attack is an actual prove for the effectiveness of the defence system when it is deployed in a real-world network, which may be very different from the testing environment. However, most of the existing defence systems generate high rate of false alarms when the network experience zero-day attack. According to [2], existing systems on network security typically use protection techniques based on known features about attack vulnerabilities. However, such techniques are no longer applicable when zero-day attacks are considered. Unfortunately, without considering zero day vulnerabilities, a security system will only have questionable result at best, because it may determine a network configuration to be more secure while that configuration is actually susceptible to zero-day attacks [2]. Signature-based systems can block attacks which have already been identified. However, network is still exposed from the period a zero day attack has been launched until a signature or features database is updated and then deployed. Considering the sophisticated manners and speed of today's attacks, even a short time without protection can be devastating. International Journal of Digital Content Technology and its Applications(JDCTA) Volume 8, Number 6, December

2 This paper proposes an intelligent and proactive model for detecting the zero day attack based on Honey Bee protection technique. The proposed model uses a new strategy to detect the unknown attack through measuring the known features about attack traffic and also the known features about normal traffic. As illustrated in Fig.1, the investigation process is executed through two steps. First, network traffic is investigated based on the known undesirable features. Second, network traffic is investigated based on the known desirable features. Network traffic which is matched with undesirable features is a known attack. Traffic which is unmatched with the undesirable features and matched with desirable features is normal traffic. Traffic which is unmatched with the undesirable features and also unmatched with the desirable features is traffic generated by a zero day attack. The rest of the paper is organized as follows: Section 2 discusses the related work; Section 3 describes the architecture of the proposed system (i.e., UA unit, DP unit, attack detection phases, and zero day attack recognition mechanism); Section 4 presents the implementation and experimental results. Section 5 provides the conclusion and future work; and Section 6 includes the acknowledgment. 2. Related Work The state-of-art of this paper focuses on giving a basic review of the most important network protections technologies, i.e. intrusion detection system (IDS). In general, there are two main categories of IDS, each with its own disadvantages: signature-based, and anomaly based detection systems [3, 4]. Signature-based scanning provides a granular layer of protection against viruses, worms, spyware, malware, and other network security threats by identifying known malicious attacks as illustrated in Fig.1. However, the main drawback of the signature-based category is its inability to detect new attacks still unknown to the attacks detector [5]. Thus, the security policy of these approaches should add new rules when a new type of attack is discovered. Known attack Unknown attack Not attack Malicious features Investigation Known attack Unknown attack Blocked Not blocked Not attack Figure 1. Known Attack Detection The disadvantage of the anomaly-based category is the possibility of deviation the normal traffic from its distribution pattern signatures [6]. In addition to the schemes of signature and anomaly-based, the existing detection schemes are also categorized into hybrid of both. One important subcategory of the hybrid schemes is the artificial intelligent-based detection systems. An intelligent system (IS) is a technique that emulates some characterize of intelligence exhibited by nature such as learning, adaptability, reasoning, as well as the ability to manage uncertain information [7]. Intelligent systems are used to support decision-making on solving problems that are difficult or impossible and obtaining consistent and efficient results [8]. Accordingly, intelligent hybrid IDS is constructed based on Neural Networks (NN), Fuzzy Inference Systems (FIS), Probabilistic Reasoning (PR) and derivative free optimization techniques such as Evolutionary Computation (EC) [9]. A hybrid intrusion detection approach based on fuzzy clustering and artificial neural network (FC- ANN) for detecting low-frequent attacks was proposed in [10]. The FC-ANN approach uses fuzzy 46

3 clustering technique to category the training data into several subcategories, and uses the subcategories to train the ANN. FC_ANN then finds membership grades of these subcategories and combines them through a new ANN to get final results. It used KDD CUP 1999 that incorporates both training and testing phase. The obtained results show that FC-ANN is more accurate than naïve Bayes and BPNN. Authors in [11-13] proposed another computational intelligence approach using dynamic selforganizing maps (DSOM) and ant colony optimization (ACO) clustering. This approach compromises four phases. The first phase is to determine the shapes and size of network during the training process using the DSOM which is an unsupervised neural network. The second phase is to use ACO clustering for selecting and clustering the objects from the output layer of DSOM based on the shortest distance. The third phase is to label the objects as normal cluster or anomalous cluster by using the labeling cluster algorithm which basically depends on DSOM and ACC clustering. The detection algorithm is handled based on Bayes theorem in the last phase. The experiment of this approach was done on the KDD99 dataset. The experimental results of this approach demonstrated higher performance than SVM and K- NN. Nevertheless, the obtained results were not numerical that makes the comparison and evaluation with likely approaches a difficult task. A recent IDS approach inspired from bees' defensive behavior in nature is proposed in [8]. In this approach, nest-mates are discriminated from the non nest-mates using Undesirable-Absent (UA) or Desirable-Present (DP), and Filtering Decision (FD) methods. UA method is responsible to detect the known attacks based on their predefined signatures. DP method is used to detect the anomalous behavior based on a trained behavior patterns. The normal patterns are learned by training the neural network with Bees Algorithm (BA). Lastly, FD method is to train the UA detector and recognize new attacks at real time. The system proposed in this paper is an extension and improvement for the work done in [14]. The improvement is represented through extending the approach to identify network attacks and tracing back their sources in a distributed way. The improvement of this work is also represented through proposing an efficient method for communication among domain members. Moreover, the coordination among domain edges in the proposed system helps in detecting the zero day attack that may be launched using distributed techniques. Nest Mate Unknown UA Phase Known DP Phase Figure 2. UA and DP Investigation Phases in Nature 47

4 3. Zero Day Attack Detection In nature, honeybee guard accept the incomers if they have a UA or DP characteristics [14]. However, these characteristics would be seen on most incomers. Thus, inspecting either UA or DP alone is not applicable in network security domain. The proposed system uses combination of both UA and DP to reduce the rates of false alarm rates. The mechanism of recognizing the attacks is then achieved by several integrated units. The communication among the various units of the system during the investigation process is done using distributed and one-to-one techniques. The data needed for the investigation process is collected and processed through the various phases in a distributed way. The mechanism of attack recognition is achieved through two main phases: undesirable absent (UA) phase and desirable present (DP) phases. The whole phases and algorithm of attack recognition mechanism in nature are illustrated in fig.2. In computer network, the idea of attack recognition inspired from Honey-Bee technique and proposed in this article is designed based on the UA and DP features. Patterns of UA and DP are created according to the malicious features and normal features respectively. Undesirable or malicious features are extracted from number of malicious records in the KDD99 data set. Neural network then receives these malicious packets from the data set and analyzes their undesirable features for misuse intrusion. The undesirable behaviors of the malicious packets are therefore learned to the system as attack signatures in order to create the UA patterns. The desirable or normal behaviors are also extracted from the normal records in the same data set KDD99. Neural network also receives the normal packets from the data set and analyzes their desirable behavior for normal packets. Neural network then train the desirable behaviors to the system units to differentiate between malicious (undesirable) and normal (desirable) behaviors. Known attack Unknown attack Not attack Malicious features Investigati Known Unknown attack Not attack Blocked Normal features Investigation Not attack Unknown attack Known attack Blocked Blocked Not blocked Figure 3. Honeybee-based Zero Day Attack Detection 48

5 The proposed system detects the zero day attack through two complementary phases: malicious features investigation and normal features investigation phases as shown in Fig.3. In the former phase, UA units that are deployed at the network edges inspect every flow incoming to the network and compare their behaviors with the UA patterns. In case the incoming flows have the same signatures of UA, UA units perform two investigation processes. First, UA unit performs a primary investigation to examine the main features of these flows. Based on the primary investigation, UA unit predicates the type of the flows and classifies them into malicious, malware or unsolicited flows. Second, UA units report the flows that are classified as suspicious into the network gateways to be blocked. UA units thus use the One-to- One connection [6] to report these malicious flows to the proper gateway in a scalable way. On the other hand, UA units report the incoming flows which don t have the same signatures of UA patterns to the normal features investigation phase. In the normal features investigation phase, DP unit through its different sub units receives the suspicious flows reported by UA units to verify if they are malicious. For this purpose, the DP unit inspects the suspicious flows to check if they have matched the features of DP patterns. It should be mentioned that the DP features can be a specific features such as marks or signatures predefined by among the various units of the system. The technique of using these marks and deploying their modules in the proposed system units will be conducted with more details as a future study. In case the investigated flows matched the pattern of DP, DP units report these flows as normal flows that should be allowed to go to the destination. The DP units also use a distribute technique to report all the network edges and therefore guarantee to prevent the occurrence of attacks through any gateway edge. Accordingly, fractions of flows which are matched with UA patterns are classified as malicious and those are matched with DP patterns are classified as normal traffic. In this paper, the key important is the fraction of flows which do not matched the UA patterns or DP patterns. The philosophy of this research is that as long as this fraction does not match the UA patterns, then it is not a known attack. Also as long as it does not match the DP pattern, then it is not a normal traffic. Based on these two consequences, this fraction of flows is an attack but not known yet. Therefore, the proposed system classifies this type of flows as a zero day attacks. 4. Experimental Result The experiment of this paper is conducted based on KDD99 data set which includes 41 attributes as input dataset and one attribute as target data set. The values of target ranged from 1 to 3 (normal, attack and unknown). To perform the experiments of the proposed system, the artificial neural network (ANN) and back propagation (BP) algorithm configurations are setup ANN Training and Testing For this experiment, neural network setting involves three layers: an input layer, a hidden layer, and an output layer. For the ANN architecture, the multi-layer architecture is adopted and back propagation (BP) algorithm is used as a training algorithm. Concerning the learning type, it is more appropriate to use supervised learning as the measurement and the observation of the target function are known. The attack detection process is suggested to be the target function and it has three values including normal, attack and unknown where unknown represent the zero day attack. MathLab is used for building, training and testing the ANN model. Figure 4 reveals the architecture of the proposed multi-layer ANN. Figure 4. Neural Network (BP approach with 2 Layers) 49

6 Generally, the prediction of outputs consists of two steps which are learning and testing steps. In the learning step, a model describing a predetermined set of concepts and parameters is created through analyzing a set of subjects or instances. An instance is supposed to be belonged to some predetermined group (normal, attack and unknown). The results of BP training algorithm indicate a reasonable level of accuracy in training, validation and testing. Figure 5 shows that mean square error (MSE) of the three sets is relatively acceptable as the MSE of the training is the smallest one. Figure 5. The performance of BP algorithm (PureLin Transfer). As shown in Figure 5, the MSE of training process (i.e. the blue line) is rapidly decreased by increasing the volume and the period of ANN training. Similarly, the MSE of the validation process (i.e. the green line) is decreased when the volume and period of ANN training are increased. The test error is represented by red line where this line is closed to the validation error line. Since the test error and validation error are almost closed, this indicates that there is a reasonable division for the dataset. Moreover, the closeness between the test and validation errors demonstrates the accuracy of using ANN with BP in predicting the future perceptions of system Accuracy of Attack Detection In order to decide on the optimal ANN parameters, it is important to find out the regression plot of the ANN models including BP. Based on the obtained results, the correlation coefficients of training, validation and testing are 0.92, and respectively as shown in Figure 6. After training the ANN using back propagation algorithm (ANNBP), 20% of dataset of 750 is used in the testing phase. This means 150 out 750 subjects are used for testing the ANN model accuracy and predicting the attack signatures. The ANN correctly identifies 131 out of 150 testing subjects. Thus, the findings indicate a reasonable level of accuracy (87.3 %). Table 1 shows the accuracy of prediction for ANNBP. Figure 6. The correlation coefficients of training 50

7 Table1. The result of training BP algorithm Training subjects 600 Testing subjects 150 Correctly identified 131 Accuracy 87.3% The predicted outputs of ANNBP 87.3% of the actual surveyed outputs (i.e. KDD99 data set), and therefore, the ANNBP correctly forecasts 131 of the testing dataset. Table 2 demonstrates a sample of predicted output. Table 2. Sample of predicted outputs Actual Collected Values Predicted Effectiveness It can be concluded from Table 2 that the artificial neural network has the capability to predict at least a 87% of the malicious packets. In the above table, 10 actual values are randomly selected in order to explain the efficiency of BP in predicting the attacks. Among the predicted values, there are two values do not match the actual values while the remaining values are same as the actual ones. In this experiment, the obtained results demonstrate that training the UA units on the malicious patterns based on the Honey-Bee technique is efficient for detecting the known attacks. Likewise, training the DP units on the normal patterns using the Honey-Bee technique will also result a similar efficiency on detecting the normal packets. Since the proposed system is able to detect the known attack and the normal traffic with a reasonable accuracy, the traffic flows which are not detected as attacks or normal have a big probability to be a new malicious traffic. Accordingly, the proposed system is capable to detect the zero day attacks with low rate of false alarms. 5. Conclusion and Future Work The protection system proposed in this article is designed as a hybrid technique between biology and computer science. The methodology of this system is inspired from the defence mechanism in HoneyBee. In this paper, the defence system in HoneyBee is the main idea to enhance the efficiency of network protection. The focus is to improve the ability of protection systems on detecting the zero day attacks. The strength of the proposed model represented through the ability to recognize unknown attacks. UA units are designed to filter the malicious flows of undesirable signatures. DP units are designed to filter the normal flows of desirable signatures. UA and DP units are designed to perform further investigation and filter the malicious flows of zero day attacks. The findings of the proposed system show that the system units correctly learn the UA and DP patterns of various attacks. Furthermore, the findings also show the efficiency of the ANN training in recognizing the novel attack by detecting the deviation of the trained patterns. The obtained results show that the accuracy of testing the proposed system on detecting the malicious and normal flows is 87.3%. The Limitation of the proposed model is the difficulty to create DP pattern which is used to differentiate between the desirable and undesirable flows. This difficulty increases the number of false positive alarms during the process of detecting the zero day attack. This model also needs more experiments to show its efficiency on detecting the attacks of zero days at real time. In the future, a distributed mechanism will be conducted for generating and distributing the normal features as desirable signatures. This would improve the accuracy of DP unit on detecting the normal flows with producing fewer false alarms. Moreover, future studies will be conducted to add another unit for analysing and investigating network forensic. This unit will category traffic attacks into DDoS, worms, network scan, and Botnet as well as determine which one is responsible for service violations. 51

8 6. Acknowledgment This work is supported by MOSTI ScienceFund grant number 305/PKOMP/613144, School of Computer Sciences, Universiti Sains Malaysia (USM). 7. References [1] Tyugu, E. Artificial intelligence in cyber defense. Paper presented in Proceeding of 3rd International Conference on Cyber Conflict (ICCC2011). IEEE, Tallinn, Estonia. (2011). [2] Wang, Lingyu, Sushil Jajodia, Anoop Singhal, Pengsu Cheng, and Steven Noel. "k-zero day safety:a network security metric for measuring the risk of unknown vulnerabilities." (2013): 1-1. [3] R.A. Martin, Snort - lightweight intrusion detection for networks, Proceedings USENIX Lisa 99 SeattLe (1999) [4] M. Thottan, J. Chuanyi, Anomaly detection in IP networks, IEEE Transactions on Signal Processing 51 (2003) [5] Ahmed, A. A., Jantan, A., & Wan, T. C. (2011). SLA-based complementary approach for network intrusion detection. Computer Communications, 34(14), [6] Ahmed, Abdulghani Ali; Jantan, Aman; Wan, Tat-Chee, "Real-Time Detection of Intrusive Traffic in QoS Network Domains," Security & Privacy, IEEE, vol.11, no.6, pp.45,53, Nov.-Dec [7] Toosi, A.N., Kahani, M.: A new approach to intrusion detection based on an evolutionary soft computing model using neuro-fuzzy classifiers. Computer communications 30, (2007) [8] Pfahringer: Winning the KDD99 classification cup: Bagged boosting. KDD (2), (2000) [9] Abraham, A. (2003). Intelligent systems: Architectures and perspectives. InRecent advances in intelligent paradigms and applications (pp. 1-35). Physica-Verlag HD. [10] Wang, G., Hao, J., Ma, J., Huang, L.: A new approach to intrusion detection using Artificial Neural Networks and fuzzy clustering. Expert Syst. Appl. 37(9), 102 (2010), doi: /j.eswa [11] Feng, Y., Zhong, J., Xiong, Z., Ye, C., Wu, K.: Network Anomaly Detection Based on DSOM and ACO Clustering. In: Liu, D., Fei, S., Hou, Z., Zhang, H., Sun, C. (eds.) ISNN LNCS, vol. 4492, pp Springer, Heidelberg (2007), [12] Feng, Y.Z., Wu, K., Wu, Z.: An unsupervised anomaly intrusion detection algorithm based on swarm intelligence. In: Feng, Y.Z., Wu, K., Wu, Z. (eds.) Proceedings of 2005 International Conference on Machine Learning and Cybernetics, vol. 7, pp IEEE Computer Society Press, Los Alamitos (2005) [13] Feng, Y.J., Zhong, J., Ye, C., Wu, Z.: Clustering based on self-organizing ant colony networks with application to intrusion detection. In: Ceballos, S. (ed.) Proceedings of 6th International Conference on Intelligent Systems Design and Applications (ISDA 2006), Jinan, China, pp IEEE Computer Society Press, Washington, DC, USA (2006). [14] Ali, G. A., & Jantan, A. (2011). A New Approach Based on Honeybee to Improve Intrusion Detection System Using Neural Network and Bees Algorithm. In Software Engineering and Computer Systems (pp ). Springer Berlin Heidelberg. 52