King Saud University Computer Science Collage CSC590_Selected Topic A Literature Review on Intrusion Detection Systems using Genetic Algorithms Phase # 5 By: Lamees Alhazzaa ID: 426221091 Proposed to: Dr. Hassan Mathkour 1
Abstract This paper presents a general overview of Intrusion Detection Systems and the methods used in these systems, giving brief points of the design principles and the major trends. Artificial intelligence techniques are widely used in this area such as fuzzy logic and Genetic algorithms. In this paper, we will focus on the Genetic algorithm technique and how it could be used in Intrusion Detection Systems giving some examples of systems and experiments proposed in this field. The purpose of this paper is to give a clear understanding of the use of Genetic Algorithms in IDS. 2
Table of contents Table of Figures... 4 Introduction... 5 Introduction... 5 Chapter 1... 6 Intrusion Detection Systems (IDS)... 6 Intrusion Detection... 6 IDS Design Principles... 7 IDS Design Trends... 7 Chapter 2... 9 Artificial Intelligence and Intrusion Detection... 9 Artificial Intelligence Systems... 9 1. ANN (Artificial Neural Networks) Approach... 9 2. Fuzzy Logic... 10 3. Genetic Algorithms... 10 Chapter 3... 12 Using Genetic Algorithms in Intrusion Detection Systems... 12 A Generic GA-based intrusion detection approach... 12 Chapter 4... 15 Proposed experiments on using GA in IDS... 15 Related work... 15 Some proposed systems... 15 Conclusion... 17 References... 18 3
Table of Figures Figure 1 : Structure of Simple GA [16]... 11 Figure 2 : The operation of GA [8]... 13 Figure 3 : Algorithm [8]... 14 Figure 4 : Results of Experiment [28].... 16 4
Introduction With the appearance of information leap in the business world and the fast pace of communication, network systems and technologies has been of a major concern to keep up with the speedy flow of information spread and communication in any where around the world. Increase of network scale, development of advanced information technologies, and other factors enhance the number of possible targets for attacks against computer networks. Hacking, Viruses, Worms and Trojan horses are some of the major attacks that panic any network systems. However, the increasing dependency on computer networks has increased in order to secure the information that could be reached by them. Along with the conventionally used security tools like firewalls, intrusion detection systems (IDS) are becoming of supreme significance. Intrusion Detections Systems (IDS) is a new path of security systems, which provides efficient approaches to secure computer networks. Artificial Intelligence approaches have been used enormously to produce a lot of IDS. Some of these approaches rely on Genetic Algorithms to provide the network with an efficient classifier to recognize and detect intrusions actions. In the following sections, IDS systems are briefly defined and explained, the next section goes briefly on AIS to give an introduction of the following section which defines the use of genetic algorithms in ID. At the end, put this all together by presenting a brief on two different system case studies that uses approaches using GA for IDS. 5
Chapter 1 Intrusion Detection Systems (IDS) Intrusion Detection The first line of defence in securing a networked system is detecting an attack, i.e. Intrusion Detection (ID). Different techniques and methods in areas in computer and information systems were used for Intrusion Detection which mainly falls under the following areas: 1. Artificial Intelligence Most of the basic most effective methods and researches in Intrusion detection rely on Artificial Intelligence systems and ideas. Such as the following: a. A lot of researchers are interested in applying rule based methods for intrusion detection, such as Data mining which uses the association rule. b. Fuzzy logic concepts gain also a major interest. Some researchers used multi-disciplinary approach such as combined fuzzy logic, genetic algorithm and association rule techniques in their work. c. ANN (Artificial Neural Networks) approach provides an unsupervised classification method to overcome the drawbacks of dimensionality for a large number of input features. (This will be further discussed in chapter 2) 2. Software Engineering Many papers focused on how to implement and develop intrusion detection and proposed frameworks in this area, but basically the implementation of IDS fall into two main approaches. a. Software implementation where IDS is deployed in the host or server which cost effective but increase the overhead on the host processor. b. IDS implementation on hardware platform to monitor and analyse intrusions in the networks. This way is more efficient and accurate but somehow expensive. 3. Embedded Programming In order to reduce the load on the IDS, some embedded Hardware parts could be added with pre-process information about the networks such as the programming the Network Interface Card to detect some major attacks such as Denial of Service attack (DoS). 4. Distributed or Agent Based Intrusion Detection 6
[6][35] It is a way of dividing the workload among distributed machines in the network but also the IDS will be able to obtain an overall knowledge of the networks working condition. To detect the intrusion more accurately and at the same time it can respond to the threats more effectively. An agent for the intrusion detection system uses a machine learning approach to automated discovery of concise rules from system call traces. A rule learning algorithm is then used to induce rules that can be used to monitor the system and detect potential intrusions. IDS Design Principles IDS are designed and implemented on modelled networked systems. Several points should be predefined and stated, in order to find a proper model for the network: Normal behaviour of a network system is the most dominant and frequent behaviour of the network in a certain time period. Anomaly within the network system least frequent and abnormal behavior of the network at a certain time period. Modeling a dynamic and complex system such as the network is very difficult, for this reason, abstraction and partial modeling are used as a good solution. The whole network components could be divided to: host, user and network environment, and the user itself could be divided into legitimate user and malicious user (intruder). Many other nested divisions could occur according to the designers point of view and the areas of focus. An Intruder detection system basically raises an alarm whenever an anomaly event occurs, which could be caused by an intruder to the system. These systems do not react equally at all times, false alarms could occur sometime and this is called False Positive (FP). The lower value of FP gives a higher value of the IDS. [6][23] IDS Design Trends There are number of different was to classify IDS in order to distinguish between their different types. The most generic classification I found for IDS is: 1. Analysis approach 2. Placement of IDS Under each of these categories several classifications could occur. [1] 1. Analysis Approach Boer and Pels in [11], gave three types of IDS which could be listed under this appraoch: NIDS: Network-based IDS which monitors the network for malicious traffic. 7
HIDS: Host-based IDS which monitors the activities of a single host DIDS: Distributed IDS correlate events from different Host- or Network based IDS They mainly focused on the HIDS, and gave four common methods used in that area defining the features, installation and maintainability details, techniques of evading the IDS and their own perspective on each of the methods listed: File system monitors, Log file analysis, Connection analysis and Kernel-base IDS. In [6] a similar division was mentioned but other most of the other references such as [5] and [22] did not consider DIDS. 2. Placement of IDS In this respect IDS are usually divided into: SIDS: Signature-based IDS, which studies the attacks patterns and defines a signature for it, to enable security specialists to design a defence against that attack. AIDS: Anomaly-based IDS, which learns the usual behaviour of a network patterns, and suspects an attack once an anomaly occurs. The mentioned above are the most known, where some researchers come up with a hybrid approach that combines the benefits of them and focuses on reducing the average of FP alarms occurrences. In [6] another type was mentioned which is Specification-based IDS which is recently introduced by researchers which specially focuses on reducing the number of FP alarms. 8
Chapter 2 Artificial Intelligence and Intrusion Detection Artificial Intelligence Systems There is no basic, simple, or agreed upon strict definition of artificial intelligence however as a general definition artificial intelligence is the science and engineering of making intelligent machines, especially intelligent computer programs. [2] Human s biological intelligence has inspired system security designers and researchers to build artificial intelligence system which emulates the defence mechanism of Human Immune Systems. Artificial Intelligence systems have been experienced and developed, which relies of the algorithms and intelligent techniques, combining the knowledge of past intrusions in improving the systems detection. [1][2][12] Regarding the Intrusion detection, most of the available researches basically focused on three types of Artificial Intelligence methods and techniques. 1. ANN (Artificial Neural Networks) Approach Artificial Neural Network (ANN) is a set of simply processing elements based on an animal neuron. Neurons are connected with communication channels, and information streams, in the form of numerical data between nodes. [2] The ability of interconnection between nodes is through weights that are changing elements which create a dynamic ANN environment. Similar to the human s neural systems, many inputs could inter a certain nerve in a parallel fashion; the inputs are summed with the weight to be used in a Transfer Function which gives the final output. ANN learning techniques are mainly divided into supervised or unsupervised according to the learning method used. Supervised method should reach a desired output, if not; the mathematical algorithms built in ANN will perform some adjustments until it reaches the expected output. The unsupervised learning is the opposite way of the former method, i.e. it is given a set of inputs and no correct output. In case of IDS these learning techniques are used to increase the system intelligence in distinguishing between normal and intruder behaviours. 9
2. Fuzzy Logic Fuzzy Logic was introduced as a means to the model of uncertainty of natural language. And due to the uncertainty nature of intrusions fuzzy sets are strongly used in discovering attack events and reducing the rate of false alarms at the same time. Basically, intrusion detection systems distinguish between two distinct types of behaviours, normal and abnormal, which create two distinct sets of rules and information. Fuzzy logic could create sets that have in-between values where the differences between the two sets are not well defined. In this case the logic depends on linguistics by taking the minimum of set of events or maximum instead of stating OR, AND or NOT operation in the if-then-else condition. This feature strongly participates in reducing the false positive alarm rates in the system. [12][24] 3. Genetic Algorithms A Genetic Algorithm (GA) is a programming technique that mimics biological evolution as a problem-solving strategy. [2] It is based on Darwinian s principle of evolution and survival of fittest to optimize a population of candidate solutions towards a predefined fitness. [16][18] GA uses an evolution and natural selection that uses a chromosome-like data structure and evolve the chromosomes using selection, recombination, and mutation operators. The process usually begins with randomly generated population of chromosomes, which represent all possible solution of a problem that are considered candidate solutions. Different positions of each chromosome are encoded as bits, characters or numbers. These positions could be referred to as genes. An evaluation function is used to calculate the goodness of each chromosome according to the desired solution, this function is known as Fitness Function. During evaluation, two basic operators, crossover and mutation, are used to simulate the natural reproduction and mutation of species. The selection of chromosomes for survival and combination is biased towards the fittest chromosomes. [16][18][20][31][3] The following figure taken from [16] shows the structure of a simple genetic algorithm. Starting by a random generation of initial population, then evaluate and evolve through selection, recombination, and mutation. Finally, the best individual (chromosome) is picked out as the final result once the optimization meet it target (Pohlheim, 2001). 10
Figure 1 : Structure of Simple GA [16] Many authors and researchers are highly motivated to Genetic Algorithms as a strong and efficient method used in different field in Artificial Intelligence, noting that several AI techniques could be combined in different ways in different systems for several purposes. 11
Chapter 3 Using Genetic Algorithms in Intrusion Detection Systems The genetic algorithm is employed to derive a set of classification rules from network audit data, and the support-confidence framework is utilized as fitness function to judge the quality of each rule. The generated rules are then used to detect or classify network intrusions in a real-time environment. [8] A Generic GA-based intrusion detection approach As a conclusion of what previously presented of AI based IDS, these systems work is divided into two main stages. Fist the training stage which provides the system with necessary information required initially, after that the next step is the detection stage where the system detects intrusions according to what was learned in the previous step. Applying this in GA based IDS; the GA is trained with classification rules learned from previous network audit data. The second stage is applied in a real-time manner by classifying the incoming network connections according to the generated rules. Many systems have been proposed in a lot of researches in either simple or advanced fashion, but to give a general idea of the components of the system and basic mechanism of it; the three following components will be highlighted: 1- Data Representation Genes should be represented in some format using different data types such as byte, integer and float. Also they may have different data ranges and other features, knowing that the genes are generated randomly, in each population generating iteration. Genetic algorithms can be used to evolve rules for the network traffic; these rules are usually in the following form: If {condition} then {act} [16] It basically contains if-then clause, a condition and an act. The conditions usually matches the current network behaviour with the one stored in the in the IDS such as comparing an intruder source IP address and port number with one already stored in the system. The act could be an alarm indicating that the intruders IP and Port numbers are related to an attacker who is previously known in the system. [16][8] 12
2 - GA Parameters GA has some common elements and parameters which should be defined: Fitness Function is defined according to [2], The fitness function is defined as a function which scales the value individual relative to the rest of population. It computes the best possible solutions from the amount of candidates located in the population. GA Operators According to the figure below we could see that the selection mutation and crossover are the most effective parts in the algorithm as they are they participate in the generation of each population. Figure 2 : The operation of GA [8] Selection is the phase where population individuals with better fitness are selected, otherwise it gets damaged. Crossover is a process where each pair of individuals selects randomly participates in exchanging their parents with each other, until a total new population has been generated. Mutation flips some bits in an individual, and since all bits could be filled, there is low probability of predicting the change. 3- Detection algorithm overview In [8], a generic algorithm has been presented which contains a training process. This algorithm is designed to apply set of classification rules according to the input data given. It follows the simple flow of genetic algorithms presented in the Figure 2. [8][16][2] 13
Figure 3 : Algorithm [8] 14
Chapter 4 Proposed experiments on using GA in IDS Related work In [16], one basic experiment was proposed which was strongly related to the stated subject of this research, where the experiments proposed were simple using narrow sample of data. Chittur in [26] presented a Generic model for applying Genetic Algorithms for IDS, which successfully achieved about 97% of intrusion detection with significantly low rate of false alarms. Some papers presented models with slight improvements, such as tracing events of the log files in an off-line method to enhance the classification rules of GA [7], or apply improvements on the data structure of classifier, either linear [32] or using tree data structures [9]. General IDS applications and frameworks were also proposed using hybrid immune-systems combining several techniques together in [30, 40], while combining two or more AI techniques such as fuzzy Data mining and Genetic algorithms were presented in [28, 29]. In [25] an improvement of the systems which uses Fuzzy data mining and GA in IDS by applying parallel GA to overcome the slowness caused by the evaluation process. DIDS experiment was proposed using Genetic Programming assembles in [10]. Some proposed systems Several case studies and experiments were applied regarding the use of GA in IDS. Some went further in combining several AI methods to propose enhanced systems and performed several tests in this regards. In the following lines, an experiments presented by Susan M. Bridges and Rayford B. Vaughn in [28] will be highlighted, one using Fuzzy data mining approach and the other using Neural Networks. 1. Fuzzy Data Mining and Genetic Algorithms Applied to Intrusion Detection. In this paper, a prototype Intelligent Intrusion Detection System was developed to demonstrate the effectiveness of data mining techniques that utilize Fuzzy Logic and Genetic Programming. The system provides a high degree of detection for both anomaly and misuse. Genetic algorithms are used to tune the fuzzy membership function to improve performance and also to provide data mining components with the set of features from the audit data. The aim was to design and abstract IDS which is adaptive, accurate and flexible. In this experiment, GA was found very effective in selecting set of features to identify 15
different types of intrusions. The figure below was presented to conclude the final results they reached. Figure 4 : Results of Experiment [28]. 16
Conclusion In this review, Intrusion Detection System overview was presented, giving the different trends and technologies that could be used Artificial Intelligence methods are gaining the most interest nowadays regarding its ability to learn and evolve, which makes them more accurate and efficient in facing the enormous number of unpredictable attacks. One major technique was highlighted, was the use of Genetic Algorithms providing system classifiers with extra intelligence. Although a lot of researches interest focus on this area of IDS but other researches claim that IDS regardless of their types are not applicable enough for today s challenges, as it provides a reactive approach for system defence. In [13] a Proactive Process Monitor Approach has been discussed (PPM), highlighting its ability to being proactive against attacks rather than reactive. 17
References 1 Aickelin, U., J. Greensmith, and J. Twycross. "Immune System Approaches to Intrusion Detection - A Review ",Natural Computing, Springer, in print, 2007, pp XXX. 2 Bobor, V. "Efficient Intrusion Detection System Architecture Based on Neural Networks and Genetic Algorithms.", Department of Computer and Systems Sciences, Stockholm University / Royal Institute of Technology, KTH/DSV, 2006. 3 Faraoun, K M., and A. Boukelif. "Genetic Programming Approach for Multi-Category Pattern Classification Applied to Network Intrusions Detection.", INTERNATIONAL JOURNAL OF COMPUTATIONAL INTELLIGENCE, Vol. 3, No. 1, 2006 pp. 79-90. 5 Zhang, J., and M. Zulkernine. "Anomaly Based Network Intrusion Detection with Unsupervised Outlier Detection",Symposium on Network Security and Information Assurance - Proc. of the IEEE International Conference on Communications (ICC), June 2006, Istanbul, Turkey. 6 Kabiri, P., and Ali A. Ghorban. "Research on Intrusion Detection and Response: a Survey."International Journal of network security, The Intelligent & Adaptive Systems Group (IAS),Vol. 1, No. 2, 4 July 2005, pp. 84-102. 7 Diaz-Gome, P. A., and D. F. Hougen. "IMPROVED OFF-LINE INTRUSION DETECTION USING a GENETIC ALGORITHM.",Proceedings of the Seventh International Conference on Enterprise Information Systems, 2005, Miami, USA. 8 Gong, R.H., M. Zulkernine, P. Abolmaesumi, "A Software Implementation of a Genetic Algorithm Based Approach to Network Intrusion Detection," Proceedings of Sixth IEEE ACIS International Conference on Software Engineering, Artificial Intelligence, Networking, and Parallel/Distributed Computing (SNPD),May 2005, Maryland,USA. 9 Stein, G., B. Chen, A. S. Wu, and Kien A. Hua. "Decision tree classifier for network intrusion detection with GA-based feature selection.", In the Proceedings of the 43rd ACM Southeast Conference, March 18-20, 2005, Kennesaw, GA,. 18
10 Folino, G., C. Pizzuti, G. Spezzano, GP Ensemble for Distributed Intrusion Detection Systems, International Conference on Advances in Pattern Recognition, ICAPR05, August 22-25, 2005, Bath, UK. 11 De Boer, P., and Martin Pels,"Host-Based Intrusion Detection Systems.", Technical Report:1.10, Faculty of Science, Informatics Institute, University of Amsterdam, 2005. 12 Yao, J. T., S.L. Zhao, and L.V. Saxton, A study on fuzzy intrusion detection, Proceedings of SPIE Vol. 5812, Data Mining, Intrusion Detection, Information Assurance, And Data Networks Security, 28 March - 1 April 2005, Orlando, Florida, USA. 13 Bradford, P. G., and N. Hu. "A Layered Approach to Insider Threat Detection and Proactive Forensics. 21st Annual Computer Security Applications Conference, Applied Computer Security Associates (ACSA),December 5-9, 2005, Tucson, Arizona 15 Brugger, S. T. "Data Mining Methods for Network Intrusion Detection.", Terry Brugger's Homepage. 9 June 2004. University of California, Davis. 6 Oct. 2006 <http://www.bruggerink.com/~zow/papers/brugger_dmnid.pdf>. 16 Li, W., "Using Genetic Algorithm for Network Intrusion Detection," Proceedings of the United States Department of Energy Cyber Security Group 2004 Training Conference, May 24-27, 2004, Kansas City, Kansas, USA. 18 Marczyk, A. "Genetic Algorithms and Evolutionary Computation.", The Talk, Origins Archive. 23 Apr. 2004. 7 Oct. 2006 <http://www.talkorigins.org/faqs/genalg/genalg.html>. 20 Song, D., " A LINEAR GENETIC PROGRAMMING APPROACH TO INTRUSTION DETECTION.", Master Degree for Computer Scinces, Genetic and Evolutionary Computation GECCO 2003. 21 Smith, L. S. "An Introduction to Neural Networks." Professor Leslie S. Smith, Centre for Cognitive and Computational Neuroscience. 2 Apr. 2003. 7 Oct. 2006 <http://www.cs.stir.ac.uk/~lss/nnintro/invslides.html>. 22 Coull, S., Joel Branch, Boleslaw Szymanski, and Eric Breimer. "Intrusion Detection: a Bioinformatics Approach.", Proceedings of the 19th Annual Computer Security Applications Conference, Dec. 2003, Las Vegas, Nevada. 23 Gorodetsky,V., I.Kotenko, and O.Karsaev." Multi-agent Technologies for Computer Network Security: Attack Simulation, Intrusion Detection and Intrusion Detection Learning.", 19
International Journal of Computer Systems Science and Engineering. vol.18, No.4, July 2003, pp.191-200. 24 Gomez, J., and D. Dasgupta. "Evolving Fuzzy Classifiers for Intrusion Detection.", Proceedings of the 2002 IEEE, Workshop on Information Assurance, United States Military Academy, June 2001,West Point, NY. 25 Liu, Q., S. Bridges and I. Banicescu, "Parallel genetic algorithms for tuning a fuzzy data mining system.", In Proceedings of the Artificial Neural Networks in Engineering Conference (ANNIE 2001), November 4-7, 2001, St. Louis, MO. 26 Chittur, A., "Model Generation for an Intrusion Detection System Using Genetic Algorithms., High School Honors Thesis, Ossining High School,Ossining, NY., 27 Nov, 2001. 27 Dasgupta,D., and F.A. Gonzalez, "An Intelligent Decision Support System for Intrusion Detection and Response.", In Lecture Notes in Computer Science (publsher: Springer-Verlag) as the proceedings of International Workshop on Mathematical Methods, Models and Architectures for Computer Networks Security (MMM-ACNS), May 21-23, 2001, pp 1-14, St. Petersburg, Russia. 28 Bridges, S. M., and R. M. Vaughn, Fuzzy Data Mining and Genetic Algorithms Applied to Intrusion Detection, Proceedings of the Twenty-third National Information Systems Security Conference, October 2000, Baltimore, MD. 29 Wang, W., and S.M. Bridges, Genetic Algorithm Optimization of Membership Functions for Mining Fuzzy Association Rules, Proceedings of the 7th International Conference on Fuzzy Theory & Technology, February 27 March 3, 2000, pp.131-134, Atlantic City, NJ. 30 Dasgupta, D., "Immunity-Based Intrusion Detection System: a General Framework.", 22nd National Information Systems Security Conference, The University of Memphis, 1999, Virginia, USA 31 Sinclair,C.,L.Pierce, S. Matzner,"An Application of Machine Learning to Network Intrusion Detection", Proceedings of the 15th Annual Computer Security Applications Conference, December 1999, page 371, Phoenix, AZ. 32 Mukkamala, S., A. H. Sung, and A. Abrham, "Modeling Intrusion Detection Systems Using Linear Genetic Programming Approach.", RML Technologies, Inc. Oct. 1998. 20
35 Helmer, G., J. Wong, V. Honavar, and L. Miller, "Automated Discovery of Concise Predictive Rules for Intrusion Detection.", Recursions Software Inc. Ames, IA: Department of Computer Science Iowa State University Ames, IA,2002. 40 Marin, J. A., D. Ragsdale, and J. Surdu, "A Hybrid Approach to Profile Creation and Intrusion Detection.", Information Technology and Operations Centre, United States Military Academy, Information Technology and Operations Centre. 21