NETWORK INTRUSION DETECTION USING HIDDEN NAIVE BAYES MULTICLASS CLASSIFIER MODEL Kanagalakshmi.R 1, V. Naveenantony Raj 2 1 Computer Science Deptt., Dhanalakshmi Srinivasan Institute of Research and Technology, (India) 2 Computer Science Deptt., St. Josephs College, (India) ABSTRACT Growing Internet connectivity and traffic volume, recent intrusion incidents have reemphasized the importance of network intrusion detection systems for struggling progressively sophisticated network attacks. Techniques such as pattern recognition and the data mining of network events are often used by intrusion detection systems to classify the network events as either normal events or attack events. That the Hidden Naive Bayes (HNB) model can be applied to intrusion detection problems that suffer from dimensionality highly correlated features, and high network Data stream volumes. HNB is a data mining model that relaxes the naive Bayes methods Conditional impartiality assumption. This paper mostly intensive to Hidden Naive Bayes model, The tentative results show that the HNB model exhibits a superior overall performance in terms of accuracy, error rate, and misclassification cost compared with the traditional naive Bayes model, leading extended naive Bayes models and the Knowledge Discovery and Data Mining (KDD) Cup 1999. HNB model performed better than other leading state-of-the art models, such as Support Vector Machine, in predictive accuracy. The results also indicate that HNB Model significantly improves the accuracy of detecting denial-of-services (DoS) attacks. Keywords: Data Mining, Intrusion Detection, Naïve Bayes- Classifier, Network Security. I. INTRODUCTION Intrusion detection mainly focused to identify the authorized and unauthorized user by anomaly network activity from normal network traffic. Data mining has been used to build in automatic Intrusion detection method. Data mining means extracting knowledge from large amount of data set. Intrusion detection has become a crucial element in the management of the network due to the large number of attacks constantly threaten our computer. Is defined as the method of control actions that occur in a computer system or network that is diverse from the usual activities of the system, and thus detect it.one of the main challenges in the management of high-speed network security on a large scale is to detect suspicious anomalies in the network. Intrusion Detection System is an important part of the security management system for computers and networks. Fig 1.Intrusion Detection System 76 P a g e
Researchers have developed two main approaches for intrusion detection: Intrusion Detection and specifically 1.misuse 2.anomaly.Consists misuse represent certain types of intrusions that exploit weaknesses in the system is known, or violate the security policies system. On the other side is supposed to detect anomalies all intrusive activities are Fig 2 Detection Technology necessarily anomalous. This means that if we were able to create a profile for the normal activity of the system, we can, theoretically aware of all the states of the system varying profile as stipulated intrusion attempts. Naive Bayesian classifiers [4] use Baye's theorem to classify the new instances of a data sample X. Each instance is a set of attribute values described by a vector, X = (x1, x2,,xn). Considering m classes, the sample X is assigned to the class Ciif and only if P(X Ci) P(Ci) >P (X Cj) P(Cj) for all i and j in (1, m) such that j <> i. The sample belongs to the class with maximum posterior probability [3] for the sample. For categorical data, P(Xk Ci) is calculated as the ratio of frequency of value Xk for attribute Ak and the total number of samples in the training set. For continuous valued attributes, Gaussian distribution can be assumed without loss of generality. In naive Bayesian approach, the attributes are assumed to be conditionally independent. In spite of this assumption, naive Bayesian classifiers give satisfactory results because focus is on identifying the classes for the instances, not the exact probabilities. Applications like spam mail classification[3] and text classification can use naïve Bayesian classifiers. Theoretically, Bayesian classifiers are least prone to errors. The limitation is the requirement of the prior probabilities. The amount of probability information required is exponential in terms of number of attributes, number of classes, and the maximum cardinality of attributes. With increase in number of classes or attributes, the space and computational complexity of Bayesian classifiers increase exponentially. Network security consists of the provisions and policies adopted by a network administrator to prevent and monitor unauthorized access, misuse, modification, or denial of a computer network and networkaccessible resources. Network security involves the authorization of access to data in a network, which is controlled. Network security is a complicated subject, historically only tackled by well-trained and experienced experts. Misuse/signature detection systems are based on supervised learning. During learning phase, labeled examples of network packets or systems calls are provided, from which algorithm can learn about the threats. This is very efficient and fast ways to find know threats. Nevertheless there are some important drawbacks, namely false positives, novel attacks and complication of obtaining initial data for training of the system. The false positives happen, when normal network flow or system calls are marked as a threat. For example, an user can fail to provide the correct password for three times in a row or start using the service which is deviation from the standard profile. Novel attack can be define as an attack not seen by the system, meaning that signature or the pattern of such attack is not learned and the system will be penetrated without the knowledge of the 77 P a g e
administrator. The latter obstacle (training dataset) can be overcome by collecting the data over time or relaying on public data Anomaly/outlier detection systems looks for deviation from normal or established patterns within given data. In case of network security any threat will be marked as an anomaly. II. HIDDEN NAIVE BAYES CLASSIFIERS An extended version of the naïve Bayesian classifier is the hidden naïve Bayes (HNB) classifier, which relaxes the conditional independence assumption imposed in the naive Bayesian model. The HNB model relies on the creation of another layer that represents a hidden parent of each attribute. The hidden parent combines the influences from all of the other attributes. Fig.3 HNB Structure In the HNB model, each attribute Ai has a hidden parent Ahpi, where i = 1, 2,, n represents the weighted influences from all of the other attributes, as shown with the dashed circles. The joint distribution is defined as where The HNB classifier can be defined as Where 78 P a g e
One approach for determining the weights Wij, where i,j = 1, 2,, n and i is notequal to j, uses the conditional mutual information between two attributes Ai and Ajas the weight of P(Ai Aj, C), as shown in equation. Fig 4.Naive Bayes Classifier Ip(Ai,Aj C) is the conditional mutual information defined in given equation The HNB method is based on the idea of creating a hidden parent for each attribute. The influences from all of the other attributes can be easily combined through conditional mutual information by estimating the parameters from the training data. Although including the influence of complex attributes dependencies in large datasets is a promising idea, no previous studies have applied this model to the intrusion detection domain. where F( ) is the frequency with which a combination of terms appears in the training data, t is the number of training examples, k is the number of classes, and niis the number of values of attribute Ai. 79 P a g e
III. NETWORK SECURITY FOR IDS The networks are computer networks [1], both public and private, that is used every day to conduct transactions and communications among businesses, government agencies and individuals. The networks are comprised of "nodes", which are "client" terminals (individual user PCs), and one or more "servers" and/or "host" computers. They are linked by communication systems, some of which might be private, such as within a company and others which might be open to public access[1]. The obvious example of a network system that is open to public access is the Internet, but many private networks also utilize publicly-accessible communications. Today, most companies' host computers [1] can be accessed by their employees whether in their offices over a private communications network, or from their homes or hotel rooms while on the road through normal telephone lines. Network security involves all activities that organizations, enterprises, and institutions undertake to protect the value and ongoing usability of assets and the integrity and continuity of operations. An effective network security strategy requires identifying threats and then choosing the most effective set of tools to combat them. With the tremendous growth of network-based [2] services and sensitive information on networks, network security is becoming more and more importance than ever before. Intrusion detection techniques are the last line of defenses against computer attacks [2] behind secure network architecture design, firewalls, and personal screening. Despite the plethora of intrusion prevention techniques available, attacks against computer systems are still successful. Thus, intrusion detection [2] systems (IDSs) play a vital role in network security. Fig 5 Network Topology Network is the Information system(s) [1] implemented with a collection of interconnected components. Such components may include routers, hubs, cabling, telecommunications controllers, key distribution centers, and technical control devices. Network security [2] refers to any activities designed to protect your network. Specifically, these activities protect the usability, reliability, integrity, and safety of your network and data. Effective network security [1] targets a variety of threats and stops them from entering or spreading on your network. Intrusion poses a serious security risk in a network environment. The ever growing new intrusion types pose a serious problem for their detection. The human labeling of the available network audit data instances is usually tedious, time consuming and expensive.the use of network intrusion detection systems (NIDS), which detect attacks by observing various network activities. It is therefore crucial that such systems are accurate in identifying attacks, quick to train and generate as few false positives as possible. Internet has become a part and parcel of dailylife. The current internet-based information processing systems are prone to different kinds of 80 P a g e
threats which lead to various types of damages resulting in significant losses. Therefore, the importance of information.security is evolving quickly. The most basic goal of information security is to develop defensive information systems which are secure from unauthorized access, use, disclosure, disruption, modification, or destruction. Moreover, information security minimizes the risks related to the three main security goals namely confidentiality, integrity, and availability. Various systems have been designed in the past to identify and block the Internet-based attacks. The most important systems among them are intrusion detection systems (IDS) since they resist external attacks effectively. Moreover, IDSs provide a wall of defense which overcomes the attack of computer systems on the Internet.IDS could be used to detect different types ofattacks on network communications and computer system usage where the traditional firewall cannot perform well. Intrusion detection is based on an assumption that the behavior of intruders differ from a legal user [1]. Generally, IDSs are broadly classified into two categories namely anomaly and misuse detection systems based on their detection approaches [2,3]. Anomaly intrusion detection determines whether deviation from the established normal usage patterns can be flagged as intrusions. On the other hand, misuse detection systems detect the violations of permissions effectively. Intrusion detection systems can be built by using intelligent agents and classification techniques. Most IDSs work in two phases namely preprocessing phase and intrusion detection phase. The intrusions identified by the IDSs can be prevented effectively by developing an intrusion detection system.intrusion detection starts with instrumentation of a computer Network for data collection. Pattern-based software sensors monitor the network traffic and raise alarms when the traffic matches a saved pattern. Security analysis decides whether these alarms indicate an event serious enough to warrant a response. A response might be to shut down a part of the network, to phone the internet service provider associated with suspicious traffic, or to simply make note of unusual traffic for future reference. Network security architecture and applies the data mining technologies to analyze the alerts collected from distributed intrusion detection and prevention systems (IDS/IPS). The proposed defense in depth architecture consists of a global policy server (GPS) to manage the scattered intrusion detection and prevention systems, each of which is managed by a local policy server (LPS). The key component of the GPS is the security information management (SIM) module where data mining technology is employed to analyze the events (alerts) collected from the LPSs. Once a DDoS attack is recognized by the SIM module, the GPS informs the LPS (IDS/IPS) to adjust the thresholds immediately to block the attack from the sources. To evaluate the effectiveness of the proposed defense in depth architecture, a prototyping is implemented, where three different data mining tools are employed. Experiment results demonstrate that for detecting the DDOS attacks, the proposed data mining-based defense in depth architecture performs very well on attack detection rate and false alarm rate. IV. VARIOUS CLASSIFICATIONS ALGORITHM 1. J48: J48 classifier is a simple C4.5 decision tree for classification and creates a binary tree. The decision tree approach is very helpful in classification problem. According to this technique, a tree is constructed which models the Classification process. Once the tree is formed, it is applied to each tuple in the database and results in classification for that tuple [15]. The J48 Decision tree classifier follows the following simple algorithm. In order to classify a new item, it first needs to create a decision tree based on the attribute values of the available training data. So, whenever it encounters a set of items (training set) it identifies the attribute that discriminates the various instances most clearly. This feature that is able to tell us most about the data instances so that we can classify them the best is said to have the highest information gain. Now, among the possible values of this feature, if there is any value for which there is no ambiguity, that is, for which the data instances falling within 81 P a g e
its category have the same value for the target variable, then we terminate that branch and assign to it the target value that we have obtained. 2. Naive Bayes: A naive Bayes classifier assumes that the value of a particular feature is unrelated to the presence or absence of any other feature, given the class variable. For example, a fruit may be considered to be an apple if it is red, round, and about 3" in diameter. A naive Bayes classifier considers each of these features to contribute independently to the probability that this fruit is an apple, regardless of the presence or absence of the other features. In other words, the Naive Bayes algorithm is a simple probabilistic classifier used for calculating a set of probabilities by counting the frequency and combinations of values in a given data set. The algorithm uses Bayes theorem and assumes all attributes to be independent given the value of the class variable. The algorithm tends to perform well and learn rapidly in a variety of supervised classification problems [16]. 3. BayesNet: A Bayesian network or probabilistic directed acyclic graphical model is a probabilistic graphical model (a type of statistical model) that represents a set of random variables and their conditional dependencies via a directed acyclic graph (DAG). For example, a Bayesian network could represent the probabilistic relationships between diseases and symptoms. Graphical models such as Bayesian networks provide a general framework which is used for dealing with uncertainty in a probabilistic setting and thus are well suited to tackle the problem of churn management. Bayesian Networks was coined by Pearl (1985). In Bayesian network, every graph codes a class of probability distributions. The nodes of that graph comply with the variables of the problem domain. Arrows between nodes show relations between the variables. These dependencies are quantified by conditional distributions for every node given its parents. 4. ZeroR: The simplest of the rule based classifiers is the majority class classifier, called 0-R or ZeroR in Weka. The 0-R (zero rule) classifier takes a look at the target attribute and its possible values. It will always output the value that is most commonly found for the target attribute in the given dataset. 0-R as its names suggests; it does not include any rule that works on the non target attributes. So more specifically it predicts the mean (for a numeric type target attribute) or the mode (for a nominal type attribute). Zero-R is a simple and trivial classifier, but it gives a lower bound on the performance of a given dataset which should be significantly improved by more complex classifiers. As such it is a reasonable test on how well the class can be predicted without considering the other attributes. It can be used as a Lower Bound on Performance [15]. 5. Ridor: Ridor algorithm generates a default rule first and then the exceptions for the default rule with the least (weighted) error rate. Then it generates the best exceptions for each exception and iterates until pure. Thus it performs a tree-like expansion of exceptions. The exceptions are a set of rules that predict classes other than the default. IREP is used to generate the exceptions. It is well known that classification models produced by the Ripple Down Rules (RDR) are easier to update and maintain. They are compact and are capable of providing an explanation of their reasoning which makes them easy to understand for medical practitioners. Ripple Down Rules were initially introduced as an approach that facilitated the maintenance problem in knowledge based systems. Their applications in various domains have been actively investigated. Multiple Classification Ripple Down Rules (MCRDR) are of particular interest for medical applications, since they are capable of producing multiple conclusions for each instance, which may correspond to several diagnoses for one patient [16]. 6. PART: PART algorithm combines two general data mining strategies; the divide-and-conquer strategy for decision tree learning and the separate-and-conquer strategy for rule learning. In the divide-and-conquer approach, an attribute is placed at the root node and then the tree is divided by making branches for each possible value of the attribute. For each branch the same process is carried out recursively, using only those 82 P a g e
instances that reach the branch. In order to build the rules, the separate-and-conquer strategy is employed. A rule is derived from the branch of the decision tree explaining the most cases in the dataset, instances covered by the rule are removed, and the algorithm continues creating rules recursively for the remaining instances until none are left [16]. 7. Prism: The Prism algorithm was introduced by Cendrowska.The Prism classification rule induction algorithm promises to induce qualitatively better rules compared with the traditional TDIDT (Top Down Induction of Decision Trees) algorithm. Compared with decision trees Prism is less vulnerable to clashes, it is bias towards leaving a test record unclassified rather than giving it a wrong classification and it often produces many fewer terms than the TDIDT algorithm if there are missing values in the training set. The algorithm generates the rules concluding each of the possible classes in turn. Each rule is generated term by term, with each term of the form attribute =value. The attribute/value pair added at each step is chosen to maximize the probability of the target outcome class. 8. CBA (Classification Based Association): Association rules are used to analyse relationships between data in large databases. Classification involves learning a function which is capable of mapping in stances into distinct classes. Now both the association rule mining and classification rule mining can be integrated to form a framework called as Associative Classification and these rules are referred as Class Association Rules. The use of association rules for Classification is restricted to problems where the instances can only belong to a discrete number of classes. The reason is that association rule mining is only possible for categorical attributes. However, association rules in their general form cannot be used directly. We have to restrict their definition. When we want to use rules for classification, we are interested in rules that are capable of assigning a class membership. A class association rule is obviously a predictive task. By using the discriminative power of the Class Association Rules we can also build a classifier [16]. V. CONCLUSION In Hidden Naive Bayes Multiclass classification Model, need to apply data mining methods to network Events to classify network attack events. The performance improvement of the naive Bayes model in data mining and introduced the HNB model as a solution to the intrusion detection problem. We augmented the naive Bayes and structurally extended naive Bayes methods with the leading discretization and feature selection methods to increase the accuracy and decrease the error rate and resource requirements of intrusion detection problem. Compared the performance of the naive Bayes and leading extended naive Bayes approaches with the new HNB approach as an intrusion detection system. The hidden Naive Bayes Multiclass classification model augmented with various discretization and feature selection methods shows better overall results in terms of detection accuracy, error rate and misclassification cost than the traditional naive Bayes model, the leading Extended naive Bayes models and the KDD 99 Dataset. Definitely this model significantly improves the detection of denial-of-service(dos) attacks compared with the other models. Considering its simplicity and its advantage over the naive Bayes model s conditional independence assumption, hidden naive Bayes is a promising model for datasets with dependent attributes, like that the KDD 99 intrusion detection dataset. VI. FUTURE WORK In Future Work Focused to Two areas are identified. The same research framework can be used to study the effects of using hidden naive Bayes model as a binary classifier instead of multiclass classifier. This classifier 83 P a g e
might provide better detection performance, but the learned information will be limited since it will only indicate if a network event is an attack event or a normal event. Second, the framework can be modified to study the effects of a multi-classifier model, which consists of the best algorithms for each attack category to get the better overall detection for the specific attack categories. REFERENCES [1] P. Wu, H. Changzheng, Y. Shuping and W. Zhigang, A Dynamic Intrusive Intention Recognition Method Based on Timed Automata, Journal of Computer Research and Development, vol. 48, no. 7, (2011), pp. 1288-1297. [2] M. Bratman, Intentions, Plans, and Practical Reason, Massachusetts: Harvard University Press, (1987). [3] W Stallings, Cryptography and Network Security Principles and Practices(Prentice Hall, Upper Saddle River, 2006) [4] J Anderson, An Introduction to Neural Networks (MIT, Cambridge, 1995) [5] Guttman A,:R-trees:A Dynamic Index Structure for Spatial Searching Proc,SIGMOD 84,pp,47-57. [6] Y. Z Li, J. S Luo, Y Sun. Architecture Study of Intrusion Detection System Based on Mobile Agent. Journal of Computer Research and Development, 2006, 43: 296 301. [7] H. R Wang, R. FMa. Optimization of Neural Networks for Network Intrusion Detection. First International Workshop on Education Technology and Computer Science(ETCS09), USA, 2009:418 420. [8] ESKIN E, ANLOLD A, PRERAU M, et al. A Geometric Framework for Unsupervised Anomaly. [9] Jiawei Han MichelineKamber Data Mining Concept and Techniques, San Francisco, California. [10] Martin Ester, Hans-Peter Kriegel, Jörg Sander, XiaoweiXu, A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise, Published in Proceedings of 2nd International Conference on Knowledge Discovery and Data Mining. [11] Biswanath Mukherjee, L.ToddHeberlein, Karl N.Levitt, Network Intrusion Detection, IEEE, June 1994. [12] Presentation on Intrusion Detection Systems, Arian Mavriqi. [13] Intrusion Detection Methodologies Demystified, Enterasys Networks TM. [14] W. Lee, S.J. Stolfo, K.W. Mok, A data mining framework for building intrusion detection models, in: Proceedings of IEEE Symposium on Security and Privacy, 1999, pp. 120 132. [15] W. Feng, Q. Zhng, G. Hu, J Xiangji Huang, Mining network data for intrusion detection through combining SVMs with ant colony networks Future Generation Computer Systems,2013. [16] Tina R. Patil and S. S. Sherekar (2013) Performance Analysis of Naive Bayes and J48 Classification Algorithm for Data Classification, International Journal Of Computer Science And Applications, vol. 2, 6, 2013, pp. 256-261. [17] Inamdar S. A., Narangale S.M. and Shinde G. N. (2011) Preprocessor Agent Approach to Knowledge,Discovery Using Zero-R Algorithm, International Journal of Advanced Computer Science and Applications, vol. 2, 12, 2011, pp. 82-84. 84 P a g e