Negative Selection and Niching by an Artificial Immune System for Network Intrusion Detection



Similar documents
Module 2 LOSSLESS IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

RequIn, a tool for fast web traffic inference

The Development of Web Log Mining Based on Improve-K-Means Clustering Analysis

Feature selection for intrusion detection. Slobodan Petrović NISlab, Gjøvik University College

Stochastic Protocol Modeling for Anomaly Based Network Intrusion Detection

An Interest-Oriented Network Evolution Mechanism for Online Communities

Project Networks With Mixed-Time Constraints

A Hierarchical Anomaly Network Intrusion Detection System using Neural Network Classification

What is Candidate Sampling

An Alternative Way to Measure Private Equity Performance

How To Understand The Results Of The German Meris Cloud And Water Vapour Product

DEFINING %COMPLETE IN MICROSOFT PROJECT

CHOLESTEROL REFERENCE METHOD LABORATORY NETWORK. Sample Stability Protocol

benefit is 2, paid if the policyholder dies within the year, and probability of death within the year is ).

A Replication-Based and Fault Tolerant Allocation Algorithm for Cloud Computing

Enterprise Master Patient Index

Calculation of Sampling Weights

A Secure Password-Authenticated Key Agreement Using Smart Cards

Luby s Alg. for Maximal Independent Sets using Pairwise Independence

A New Task Scheduling Algorithm Based on Improved Genetic Algorithm

Recurrence. 1 Definitions and main statements

FREQUENCY OF OCCURRENCE OF CERTAIN CHEMICAL CLASSES OF GSR FROM VARIOUS AMMUNITION TYPES

INVESTIGATION OF VEHICULAR USERS FAIRNESS IN CDMA-HDR NETWORKS

A spam filtering model based on immune mechanism

A role based access in a hierarchical sensor network architecture to provide multilevel security

Traffic-light a stress test for life insurance provisions

IT09 - Identity Management Policy

Minimal Coding Network With Combinatorial Structure For Instantaneous Recovery From Edge Failures

PAS: A Packet Accounting System to Limit the Effects of DoS & DDoS. Debish Fesehaye & Klara Naherstedt University of Illinois-Urbana Champaign

1.1 The University may award Higher Doctorate degrees as specified from time-to-time in UPR AS11 1.

APPLICATION OF PROBE DATA COLLECTED VIA INFRARED BEACONS TO TRAFFIC MANEGEMENT

On the Optimal Control of a Cascade of Hydro-Electric Power Stations

Traffic State Estimation in the Traffic Management Center of Berlin

A Performance Analysis of View Maintenance Techniques for Data Warehouses

Single and multiple stage classifiers implementing logistic discrimination

1. Fundamentals of probability theory 2. Emergence of communication traffic 3. Stochastic & Markovian Processes (SP & MP)

7.5. Present Value of an Annuity. Investigate

Ants Can Schedule Software Projects

FORMAL ANALYSIS FOR REAL-TIME SCHEDULING

A DISTRIBUTED REPUTATION MANAGEMENT SCHEME FOR MOBILE AGENT- BASED APPLICATIONS

Software project management with GAs

An MILP model for planning of batch plants operating in a campaign-mode

Automated Network Performance Management and Monitoring via One-class Support Vector Machine

A DYNAMIC CRASHING METHOD FOR PROJECT MANAGEMENT USING SIMULATION-BASED OPTIMIZATION. Michael E. Kuhl Radhamés A. Tolentino-Peña

A Parallel Architecture for Stateful Intrusion Detection in High Traffic Networks

A Load-Balancing Algorithm for Cluster-based Multi-core Web Servers

VoIP Playout Buffer Adjustment using Adaptive Estimation of Network Delays

The OC Curve of Attribute Acceptance Plans

A DATA MINING APPLICATION IN A STUDENT DATABASE

BUSINESS PROCESS PERFORMANCE MANAGEMENT USING BAYESIAN BELIEF NETWORK. 0688,

Network Security Situation Evaluation Method for Distributed Denial of Service

"Research Note" APPLICATION OF CHARGE SIMULATION METHOD TO ELECTRIC FIELD CALCULATION IN THE POWER CABLES *

An RFID Distance Bounding Protocol

Dynamic Scheduling of Emergency Department Resources

Frequency Selective IQ Phase and IQ Amplitude Imbalance Adjustments for OFDM Direct Conversion Transmitters

Open Access A Load Balancing Strategy with Bandwidth Constraint in Cloud Computing. Jing Deng 1,*, Ping Guo 2, Qi Li 3, Haizhu Chen 1

Network Aware Load-Balancing via Parallel VM Migration for Data Centers

Politecnico di Torino. Porto Institutional Repository

Optimization Model of Reliable Data Storage in Cloud Environment Using Genetic Algorithm

Methodology to Determine Relationships between Performance Factors in Hadoop Cloud Computing Applications

Mining Multiple Large Data Sources

Data Broadcast on a Multi-System Heterogeneous Overlayed Wireless Network *

Learning with Imperfections A Multi-Agent Neural-Genetic Trading System. with Differing Levels of Social Learning

NEURO-FUZZY INFERENCE SYSTEM FOR E-COMMERCE WEBSITE EVALUATION

J. Parallel Distrib. Comput.

IMPACT ANALYSIS OF A CELLULAR PHONE

Forecasting the Demand of Emergency Supplies: Based on the CBR Theory and BP Neural Network

Watermark-based Provable Data Possession for Multimedia File in Cloud Storage

Figure 1. Time-based operation of AIDP.

A hybrid global optimization algorithm based on parallel chaos optimization and outlook algorithm

QOS DISTRIBUTION MONITORING FOR PERFORMANCE MANAGEMENT IN MULTIMEDIA NETWORKS

Course outline. Financial Time Series Analysis. Overview. Data analysis. Predictive signal. Trading strategy

An Evaluation of the Extended Logistic, Simple Logistic, and Gompertz Models for Forecasting Short Lifecycle Products and Services

How Sets of Coherent Probabilities May Serve as Models for Degrees of Incoherence

A Dynamic Energy-Efficiency Mechanism for Data Center Networks

SEVERAL trends are opening up the era of Cloud

How To Know The Components Of Mean Squared Error Of Herarchcal Estmator S

Effective Network Defense Strategies against Malicious Attacks with Various Defense Mechanisms under Quality of Service Constraints

Master s Thesis. Configuring robust virtual wireless sensor networks for Internet of Things inspired by brain functional networks

METHODOLOGY TO DETERMINE RELATIONSHIPS BETWEEN PERFORMANCE FACTORS IN HADOOP CLOUD COMPUTING APPLICATIONS

Multiple-Period Attribution: Residuals and Compounding

A FEATURE SELECTION AGENT-BASED IDS

Simple Interest Loans (Section 5.1) :

Design and Development of a Security Evaluation Platform Based on International Standards

行 政 院 國 家 科 學 委 員 會 補 助 專 題 研 究 計 畫 成 果 報 告 期 中 進 度 報 告

SCHEDULING OF CONSTRUCTION PROJECTS BY MEANS OF EVOLUTIONARY ALGORITHMS

VRT012 User s guide V0.1. Address: Žirmūnų g. 27, Vilnius LT-09105, Phone: (370-5) , Fax: (370-5) , info@teltonika.

Efficient Project Portfolio as a tool for Enterprise Risk Management

The Load Balancing of Database Allocation in the Cloud

Transcription:

Negatve Selecton and Nchng by an Artfcal Immune System for Network Intruson Detecton Jungwon Km and Peter Bentley Department of omputer Scence, Unversty ollege London, Gower Street, London, W1E 6BT, U.K. Phone: +44-171-380-7329, Fax: +44-171-387-1397, e-mal: {J.Km, P.Bentley}@cs.ucl.ac.uk Abstract Ths paper presents a negatve selecton algorthm wth nchng by an artfcal mmune system, for network ntruson detecton. The paper starts by ntroducng the advantages of negatve selecton algorthm as a novel dstrbuted anomaly detecton approach for the development of a network ntruson detecton system. After dscussng the problems of exstng approaches usng negatve selecton for network ntruson detecton, ths paper presents a modfed negatve selecton algorthm wth nchng, whch shows dversty, generalty and requres less computaton tme. The network packet data used n ths work s then ntroduced and a novel genotype encodng scheme to handle ths data and a correspondng ftness functon s explaned. 1 INTRODUTION The bologcal mmune system has been successful at protectng the human body aganst a vast varety of foregn pathogens (Tzard, 1995). A growng number of computer scentsts have carefully studed the success of ths competent natural mechansm and proposed computer mmune models for solvng varous problems ncludng fault dagnoss, vrus detecton, and mortgage fraud detecton (Dasgupta, 1998). Among these varous areas, ntruson detecton s a vgorous research area where the employment of an artfcal mmune system has been examned (Dasgupta, 1998), (Somaya, Hofmeyr and Forrest, 1997). The man goal of ntruson detecton s to detect unauthorsed use, msuse and abuse of computer systems by both system nsders and external ntruders. Among automated ntruson detecton systems, a partcular system for network ntruson detecton, known as a network-based ntruson detecton system (IDS), montors any number of hosts on a network by scrutnsng the audt trals of multple hosts and network traffc. It s usually comprsed of two man components: an anomaly detector and a msuse detector (Mykeree et al, 1994). The anomaly detector establshes the profles of normal actvtes of users, systems, system resources, network traffc and/or servces and detects ntrusons by dentfyng sgnfcant devatons from the normal behavour patterns observed from profles. The msuse detector defnes suspcous msuse sgnatures based on known system vulnerabltes and a securty polcy. Ths component probes whether these msuse sgnatures are present or not n the audtng trals. urrently many network-based IDS s have been developed usng dverse approaches (Mykeree et al, 1994). Nevertheless, there stll reman unresolved problems to buld an effectve network-based IDS (Km and Bentley, 1999a). As one approach of provdng the solutons of these problems, the prevous work (Km and Bentley, 1999a) dentfed a set of general requrements for a successful network-based IDS and three desgn goals to satsfy these requrements: beng dstrbuted, selforgansng and lghtweght. In addton, Km and Bentley (1999a) ntroduced a number of remarkable features of human mmune systems that satsfy these three desgn goals. It s antcpated that the adopton of these features should help the constructon of an effectve networkbased IDS. Ths paper proposes the use of negatve selecton and nchng of artfcal mmune system for developng an effectve network-based IDS. An overall artfcal mmune model for network ntruson detecton presented n (Km and Bentley, 1999b) conssts of three dfferent evolutonary stages: negatve selecton, clonal selecton, and gene lbrary evoluton. Among these stages, the frst stage, negatve selecton, s nvestgated n ths paper. We present a more effcent mplementaton of negatve selecton usng a nchng feature of artfcal mmune systems. Ths paper s organsed as follows; secton 2 dscusses a negatve selecton algorthm orgnally devsed by Forrest, Hofmeyr, and Somaya (1997) and a nchng mechansm by an artfcal mmune system (Smth, Forrest, and Perelson, 1993) as a soluton of dentfed current negatve selecton problems. Secton 3 ntroduces a modfed negatve selecton wth nchng and shows how ths s employed for network ntruson

detecton. Secton 4 descrbes detals of network traffc packet data used n ths work. Then, n secton 5, detaled mplementaton ponts ncludng genotypes, phenotypes, genetc operators and ftness functons are provded. Fnally, ths paper concludes from ths work and brefly descrbes future work. 2 RELATED WORK The basc dea of the human mmune system s the ablty to dstngush self, whch s normal, from non-self, whch s abnormal. For a human body, varous detector cells, called antbodes, are contnuously generated and dstrbuted to a whole body. The dstrbuted antbodes montor all lvng cells and detect non-self cells, called antgens, nvadng nto a human body. Ths man procedure s performed by three evolutonary stages descrbed above and each stage plays ts dfferent and sgnfcant role n makng the overall mmune system functon successfully (Km and Bentley, 1999a). 2.1 NEGATIVE SELETION OF THE HUMAN IMMUNE SYSTEM An mportant feature of the human mmune systems s ts ablty to mantan dversty and generalty. It s able to detect a vast number of antgens wth a smaller number of antbodes. In order to make ths possble, t s equpped wth several useful functons (Km and Bentley, 1999a). One such functon s the development of mature antbodes through the gene expresson process. The human mmune system makes use of gene lbrares n two types of organs called the thymus and the bone marrow. When a new antbody s generated, the gene segments of dfferent gene lbrares are randomly selected and concatenated n a random order, see fgure 1. The man dea of ths gene expresson mechansm s that a vast number of new antbodes can be generated from new combnatons of gene segments n the gene lbrares. Gene Lbrary Antgen Antbody Fgure 1: Gene Expresson Process However, ths mechansm ntroduces a crtcal problem. The new antbody can bnd not only to harmful antgens but also to essental self cells. To prevent such serous damage, the human mmune system employs negatve selecton. Ths process elmnates mmature antbodes, whch bnd to self cells passng by the thymus and the bone marrow. From newly generated antbodes, only those whch do not bnd to any self cell are released from the thymus and the bone marrow and dstrbute throughout the whole human body to montor other lvng cells. Therefore, the negatve selecton stage of the human mmune system s mportant to assure that the generated antbodes do not to attack self cells. 2.2 NEGATIVE SELETION ALGORITHM Even though the clear role of negatve selecton n a human mmune system s to elmnate harmful antbodes, t shows some other mportant features, whch can help us to devse a more effectve anomaly detecton algorthm. onventonal anomaly detecton algorthms generally establsh the normal behavour of a montored system and spot sgnfcant devatons from the establshed normal characterstcs. The antgen detecton mechansm by antbodes follows ths conventonal anomaly detecton algorthm n a way, but t shows some other strengths over ths conventonal algorthm. Forrest et al (1994), (Forrest, Hofmeyr, and Somaya, 1997) proposed and used a negatve selecton algorthm for varous anomaly detecton problems. Ths algorthm conssted of three phases: defnng self, generatng detectors and montorng the occurrence of anomales. In the frst phase, t defnes self n the same way that other anomaly detecton approaches establsh the normal behavour patterns of a montored system. In other words, t regards the profled normal patterns as self patterns. In the second phase, t generates a number of random patterns that are compared to each self pattern defned n the frst phase. If any randomly generated pattern matches a self pattern, ths pattern fals to become a detector and thus t s removed. Otherwse, t becomes a detector pattern and montors subsequent profled patterns of the montored system. Durng the montorng stage, f a detector pattern matches any newly profled pattern, t s then consdered that new anomaly must have occurred n the montored system. Ths negatve selecton algorthm has been successfully appled to detect computer vruses (Forrest 1994), tool breakage detecton and tme-seres anomaly detecton (Dasgupta, 1998). Besdes these practcal results, D haeseleer, Forrest and Helman (1997) showed several advantages of negatve selecton as a novel dstrbuted anomaly detecton approach. One of the formdable features s that ths novel approach does not defne specfc anomales to be detected and thus t does not requre the pror knowledge of anomales. Ths feature allows t to be able to detect prevously unseen anomales. In addton, the detecton s dstrbuted and local. Ths trat orgnates from the aggregaton of dstrbuted and ndependent detector detecton. That s to say, an ndvdual detector contans only a subset of the patterns needed to descrbe all exstng anomales, and t montors only small parts of the system. Therefore, each detector recognses only the anomales of the small secton of the system that t montors, and the overall abnormal status s dagnosed by the collecton of ndependent detecton

results. Moreover, ths dstrbuted detecton by local detectors provdes robustness wthn the system. The anomaly detecton problem for computer securty such as computer vrus detecton and ntruson detecton especally requres robustness of the detecton algorthm. It has to be robust enough to wthstand the attack and any system faults. The multple detecton ponts by ndependent detectors and the unqueness of each detector allow t to be robust (Km and Bentley, 1999a), (Forrest, Hofmeyr, and Somaya, 1997). However, the current negatve selecton algorthms show several drawbacks. The most sgnfcant problem s the excessve computatonal tme caused by the randomgeneraton approach to buldng vald detectors. Ths results n the exponental growth of computatonal effort wth the sze of self patterns (D haeseleer, 1997). Moreover, t s very dffcult to know whether the number of generated detectors s large enough that can satsfy the acceptable detecton falure probablty. D haeseleer derved a formula presentng an approprate number of detectors when an acceptable falure probablty s gven and clamed that the derved formula allows the negatve selecton algorthm to tune ts detecton accuracy aganst the cost of generatng and storng detectors. However, ths work has been accomplshed under some unrealstc assumptons: t does not take nto account false postve error and ndependence between self patterns. Furthermore, he only consdered bnary patterns and a smple r-contguous bt matchng rule. Nevertheless, t s not easy to estmate the approprate number of detectors when the negatve selecton algorthm employs numercal patterns and a more sophstcated matchng rule. Therefore, ths dffculty may force the negatve selecton algorthm to adopt an arbtrary number of detectors and ths may cause an unexpected low detecton accuracy or the neffcent computaton by generatng more than suffcent number of detectors. 2.3 NIHING OF ARTIFIIAL IMMUNE SYSTEMS Even though new antbodes survvng negatve selecton are assured to be self-tolerant, ther effcacy to detect antgens s unknown when they are released from the bone marrow and the thymus. Ths s because new antbodes are randomly generated and they are verfed only not to be self. They mght hold non-self patterns but not antgen patterns. In order to exclude these neffectual detectors, the human mmune system adopts the evoluton of antbodes towards the exstng antgen patterns (Tzard, 1995). Durng ths evoluton process, the human mmune system uses ts own unque nchng strategy to mantan generalty and dversty of antbodes as one part of clonal selecton process (Forrest et al, 1993). In a human mmune system, ths nchng process s operated only after antbodes are released from the thymus and the bone marrow. However, for an artfcal mmune system, ths remarkable feature of a human mmune system, whch mantans dversty and generalty of antbodes, can be appled for mprovng the effcency of negatve selecton algorthm. By usng ths evoluton process rather than the orgnally suggested random generaton of detectors, computatonal tme can be reduced. In addton, the problem of tunng the approprate number of detectors may be solved by multmodal convergence feature of a nchng strategy. Forrest et al (1993) presented the nchng strategy of ther artfcal mmune system whch follows the analogy of the human mmune systems. They explored whether t s able to ) detect common patterns of randomly presented antgens and ) to dscern and mantan the dverse antgen populaton. In ther model, they created one populaton of antbodes and one populaton of antgens randomly. They used the GA to evolve the antbody populaton under a constant antgen populaton. onformng to the nchng strategy of the human mmune system, for each generaton, ther modfed GA selects an arbtrary sze of random sample from the antbody populaton and a sngle random antgen from the antgen populaton. After each antbody n the sample s matched aganst a selected antgen, the ftness score of only one antbody showng the hghest match score s ncreased whle the ftness scores of the others reman the same. Usng ths algorthm, Forrest et al (1993) showed antbodes evolved to be generalsts that match to most antgens to some extent. Ther analyss of ths result showed that antbodes evolved towards fndng common schema that s shared among many antgens. Through the varous experments, they observed that ths algorthm could sustan multple nconsstent antbody patterns, whch appear as the multple peaks at a search space, and the smlarty among antgens does not affect ths capablty. Moreover, they compared ths nchng strategy of the artfcal mmune system wth the ftness sharng algorthm (Smth, Forrest, and Perelson, 1993). From ths comparson, they reported that as the result of antbody samplng mechansm, the nchng strategy of the artfcal mmune system controls ts generalty va the antbody sample sze. To be more precse, when the sample sze decreases, the selectve pressures are moved towards generatng a populaton of more general antbodes. 3 ARTIFIIAL IMMUNE SYSTEM FOR NETWORK INTRUSION DETETION Whle varous artfcal mmune models have been suggested for dverse purposes (Dasgupta, 1998), prevous work (Km and Bentley, 1999a) ntroduced the salent functons of the human mmune system wth respect to network ntruson detecton. In ths work, we vew the normal actvtes of montored networks as self and ther abnormal actvtes as non-self and desgn an artfcal mmune system for dstngushng normal network actvtes from abnormal network actvtes. Most network-based IDS s montor network packets and ther dentfed anomales show crtcal sgnatures of varous

network ntrusons (Mykeree et al, 1994). Thus, the artfcal mmune system s desgned for dstngushng normal network actvtes from abnormal network actvtes and expected to detect varous network ntrusons. Router Prmary IDS Secondary IDS Detectors ommuncator ommuncaton flow Network packets Fgure 2: The Physcal Archtecture of an Artfcal Immune System Based on ths vew, we proposed a novel artfcal mmune system for network ntruson detecton (Km and Bentley, 1999b), see fgure 2. The artfcal mmune system for network ntruson detecton conssts of a prmary IDS and secondary IDS s. For the artfcal mmune system, the prmary IDS, whch we vew as beng equvalent to the bone marrow and thymus, generates numerous detector sets. Each ndvdual detector set descrbes abnormal patterns of network traffc packets, whch transferred to a montored sngle network doman. It s unque and transferred to each local host. We vew local hosts as secondary lymph nodes, detectors as antbodes and network ntrusons as antgens. At the secondary IDS s, whch are local hosts, detectors are background processes whch montor whether non-self network traffc patterns are observed from network traffc patterns profled at the montored local host. The prmary IDS and each secondary IDS have communcators to allow the transfer of nformaton between each other. For the proposed artfcal mmune system, several sophstcated mechansms of the human mmune system are embedded n three evolutonary stages: gene lbrary evoluton, negatve selecton and clonal selecton. These processes allow the artfcal mmune system to satsfy the dentfed the man goals for desgnng an effectve network-based IDS s (Km and Bentley, 1999a). 3.1 NEGATIVE SELETION ALGORITHM FOR NETWORK INTRUSION DETETION Among the three evolutonary stages comprsng the artfcal mmune system, durng the negatve selecton stage, the system generates dverse pre-detector patterns and selects mature detector patterns by elmnatng false pre-detector patterns by bndng them to self patterns (Km and Bentley, 1999b). To apply the negatve selecton algorthm, frstly, we need to generate pre-detectors and ths requres the creaton of a gene lbrary contanng varous genes. For the human mmune system, the mmature antbodes are generated va the gene expresson process, n whch the gene segments of dfferent gene lbrares are randomly selected and rearranged n a random order. From ths process, the genes of the gene lbrares contan the genetc nformaton that determnes the specfc structure of antbody bndng area, whch wll be the complementary structure of exstng antgen bndng area. These genes are usually nherted from ancestors genes. To be more precse, the genes of the gene lbrary of the human mmune system ntally have some knowledge about the antgens that had attempted to attack ancestor s body. Returnng to our problem, the genes of the ntal gene lbrary of the artfcal mmune system, whch wll be the genes of predetectors, can be the selected felds of profles to descrbe anomalous network traffc patterns. The ntal genes mght be set by the values of these felds that are observed when a prevously known network ntruson s smulated. However, the smulaton of network ntruson can be a dffcult task f network admnstrators and users of the montored network are not co-operatve. For ths reason, we employ Forrest et al s negatve selecton algorthm (Forrest et al, 1994) to generate pre-detectors, whch does not ntally requre any network ntruson smulaton. As descrbed n secton 2.2, ths algorthm conssts of three stages: defnng self, generatng detectors and montorng the occurrence of anomales. In the frst step, we defne self by buldng the profle of normal network actvtes. After understandng the detaled mechansms of network protocol and ther securty holes, we can defne the felds of profles. The detals of these felds are descrbed n secton 4. In general, the felds of the created profles represent the normal actvtes of TP/IP protocol for each sngle connecton. In the second step, the negatve selecton algorthm randomly generates the predetectors, whose felds are the same as those of self profles but the values of these felds are randomly generated. The generated feld values of these predetectors are compared to those n the self profles. If the values of the common felds of both 'self' and 'predetector' are smlar enough, ths pre-detector s removed. The scheme to measure ths smlarty s dscussed n secton 5. The survvng pre-detectors become detectors whch contan some specfc values of orgnally defned felds of the self profle. In the thrd step, we contnuously generate the profles of current network actvtes n the same way and compare ther feld values wth the detectors feld values. If the values of the same felds of both any new self and a detector are smlar enough, ths self pattern s regarded as the sgnature of network ntruson. 3.2 NEGATIVE SELETION ALGORITHM WITH NIHING Even though the negatve selecton algorthm provdes several strengths for network ntruson detecton, t s

necessary to resolve the excessve computatonal tme caused from the random generaton approach. D haeseleer (1997) ntroduced more effcent detector generaton algorthms: a lnear-tme algorthm and a greedy algorthm. The basc dea s to provde an effcent method to enumerate all canddate detectors and thus allowng the negatve selecton algorthm to select vald detectors from ths complete canddate detector set. However, ths algorthm can be used only for a bnary mmune system usng a smple r-contnuous-bts matchng rule. Ths s because they enumerate all possble vald detectors by countng the recurrence of all the potental r- contnuous-bt bnary strngs unmatchng self strngs. D haeseleer also suggested the use of a non-bnary alphabet mmune system as an mportant future nvestgaton because t s more natural n many cases. As the result, nstead of usng one of these algorthms, the negatve selecton algorthm for network ntruson detecton ntroduced n ths paper adopts the nchng strategy of Smth, Forrest, and Perelson's (1993) artfcal mmune system to buld a vald detector set. The modfed negatve selecton algorthm wth nchng smply replaces the random generaton of pre-detectors wth the evoluton of pre-detectors towards non-self. In the frst phase, the modfed negatve selecton algorthm buld self profles. In ths research, the raw network traffc packets were gathered and these packets were parsed and bult nto self profles. These profles are equpped wth prevously dentfed felds, whch can dstngush normal and abnormal network actvtes. Then, the profles are encoded n an approprate data representaton. In the second phase, when all the self profles are encoded, the negatve selecton algorthm wth nchng starts generatng detectors. In ths case, a number of dfferent self profles were created and thus the negatve selecton algorthm should generate a dfferent detector set for each self profle. Ths second phase s repeated for each self profle untl all the self profles have ther own detector sets. The second phase of ths algorthm for generatng a detector set for a self profle can be summarsed as follows: For each self profle and ts correspondng detector set: 1. D detector patterns are generated at random and ther ftness values are ntalsed wth zeroes. 2. A sample of N detector patterns s randomly selected from the generated D detector patterns. 3. A sngle self pattern s randomly selected from the self profle. 4. Each detector n the sample s compared to the selected self and the degree of smlarty s measured. 5. The ftness value of the sngle detector n the sample that shows the least smlarty s ncreased. The ftness values of other detectors reman the same. 6. The processes 2-5 are repeated (for typcally three tmes the number of antbodes (Smth, Forrest, and Perelson, 1993) ). 7. The fttest P b % detector patterns are selected as parents and genetc operators such as crossover, mutaton are appled to generate new detectors. 8. The worst P w % detector patterns are deleted to make space for chldren. 9. A new detector populaton s created by ncludng the selected parent detectors and the offsprng detectors generated n 7 10. Processes 2 8 are repeated untl the ftness values cease to change. After fnshng the second phase by performng above, the negatve selecton algorthm bulds new self profles by parsng newly captured network packets. In the thrd phase, the detector patterns n each detector set are compared to the patterns n each correspondng new self profle. If the smlarty between any detector pattern and new self pattern s beyond a predefned threshold, the algorthm generates an alarm sgnal. As seen n secton 2.3, ths nchng strategy controls the generalty of each detector accordng to a detector sample sze. For practcal reasons, we expect ths algorthm to create more general detectors so that each detector can match more than one ntruson. Ths means that even though each detector cannot bnd to one ntruson exactly, t can match a number of ntrusons to some degree. Ths approach s more lkely to be sutable for network ntruson detecton. Ths s because, as we can clearly see n the next secton, the length of each self chromosome used n ths work and the search space whch these self chromosomes form s much larger and complex than the search spaces handled n most of work usng a smple negatve selecton algorthm (D haeseleer, 1997). Furthermore, we expect the computaton tme of the modfed negatve selecton to be less due to usng evoluton rather than random search. Fnally, the approprate number of detectors wll also be naturally determned based on the multmodal convergence of evoluton process. 4 NETWORK TRAFFI DATA In ths secton, the detals about network traffc data used for ths work are descrbed. 4.1 DATA GATHERING The data chosen for ths research s avalable at http://rs.cs.uml.edu:8080/network.html. Ths s a set of tcpdump data and was collected for a part of an Informaton Exploraton Shootout, whch s a proect provdng several datasets publcly avalable for exploraton and dscovery and collectng the results of partcpants. The network packet capturng tool, tcpdump, was executed on the sngle gateway that connects an

ntra-lan to external networks. It captured TP packet headers that passed between the ntra-lan and external networks as well as wthn the ntra-lan. Fve dfferent data sets were generated. The TP packet headers of the frst set were collected when no ntruson occurs and the other four sets were collected when four dfferent ntrusons were smulated. These ntrusons are: IP spoofng attack, guessng rlogn or ftp passwords, scannng attack and network hoppng attack. The detals of attack sgnatures and attack ponts of the four dfferent attacks are not avalable. Ths data orgnally had the felds of tcpdump format such as tme stamp, source IP address, source port, destnaton IP address, destnaton port and etc. 4.2 DATA PROFILING Snce tcpdump s not desgned for securty purpose, ts prmtve felds are not enough to buld a meanngful profle. onsequently, the frst stage of our data proflng program s to extract more meanngful felds, whch can dstngush normal and abnormal. Many researchers have dentfed the securty holes of TP protocols (Porras and Valdes, 1998) and so the felds used by our profles are selected based on the extensve study of ths research. They are usually defned to descrbe the actvtes of each sngle connecton. The automated profle program was developed to extract the connecton level nformaton from TP raw packets. The TP packet headers n the orgnal fle were collected accordng to chronologcal order. These orgnal data were dumped nto MS SQL-Server DBMS and the automated profle program was mplemented n JAVA usng JDB accessng SQL-Server. 4.2.1 Profle Felds Each connecton s establshed between a source port executng on a source host and a destnaton port operatng on a destnaton host. For TP protocol, each tme the source port process of a source host ntends to communcate wth the destnaton port process of a destnaton port, t establshes connecton between them. For each TP connecton, the followng felds are extracted: onnecton dentfer: each connecton s defned by four felds, ntator address, ntator port, recever address and recever port. Thus, these four felds are ncluded n the profle frst n order to dentfy each connecton. Known port vulnerabltes: many network ntrusons attack usng varous types of port vulnerabltes. There are felds to ndcate whether ntator port or recever port potentally hold these known vulnerabltes. 3-way handshakng: TP protocol uses 3-way handshakng for a relable communcaton. When some network ntrusons attack, they often volate the 3-way handshakng rule. Thus, there are felds to check the occurrences of 3-way handshakng errors. Traffc ntensty: network actvtes can be observed by measurng the ntensty over one connecton. For example, number of packets and number of klobytes for one specfc connecton can descrbe the normal network actvty of that connecton. So, n total, self profle felds have four types of 35 dfferent felds. 4.2.2 Proflng ategores Even though the network profle felds were extracted to descrbe a sngle connecton actvty, the data used n ths research s too lmted to apply ths ntal profle. The lmt s that the data was collected for a qute short tme, around 15~20 mnutes. Durng ths bref perod, most dfferent connectons were establshed only once. An nsuffcent quantty of data was collected to buld dfferent connecton profles. Therefore, t s necessary to group dfferent connectons nto several meanngful categores untl each category can have a suffcent number of connectons to buld a profle. onsequently, a total number of connectons for each potental profle category were counted. Frst of all, the data was categorsed nto two dfferent groups: nter-connecton and ntra-connecton. Interconnecton s the group of connectons that were establshed between nternal hosts and external hosts, and ntra-connecton s the group of connectons that were establshed between nternal hosts. Furthermore, to preserve anonymty, all nternal hosts have a sngle fake address 2 and any extra nformaton about external hosts and network topology s not provded. Therefore, the profles accordng to specfc hosts are nsuffcent. Instead, n ths research, only the profles of specfc ports on any hosts are consdered. Accordng to varous possble categores, the establshed connecton number of each profle was counted. From each case, apart from a profle class that has more than 100 connectons, other profle classes were agan grouped nto other dfferent classes untl each class has more than 100 connectons. Fnally, 13 dfferent self profles were bult. Ther class names and the number of establshed connectons are shown n table 1. In table1, the class column of nter-connecton s shown as: {(a,b),(c,d)}, where a s an nternal host, b s a nternal port number, c s a external host address and d s an external port number. Hence, the connecton s establshed between (a,b) and (c, d). For the class column of ntra-connecton, a s an nternal host address, b s an nternal port number, c s an nternal host address and d s an nternal, port number. * ndcates any host address and any port number. In addton, well-known shows the ports n the range 0 to 1023 are trusted ports. These ports are restrcted to the superuser: a program must be runnng as root to lsten to a connecton. The port

numbers of commonly used IP servces, such as ftp, telnet, http, are fxed and belong to ths range. But, many common network servces employ an authentcaton procedure and ntruders often use them to snff passwords. It s worthwhle to montor these ports separately from the other ports. Therefore, f the number of connectons for any profle category, whch s based on a specfc port on any hosts, s not suffcent, these categores are regrouped nto two new classes, a wellknown port and a not well-known port. Table 1: Self Profles lass Inter-connecton Number of onnecton {(2, *), (*, 80)} 5292 {(2, *), (*, 53)} 919 {(2, *), (*, 113)} 255 {(2, *), (*, 25)} 192 {(2, *), (*, well-known)} 187 {(2, *), (*, not well-known)} 756 {(2, 53), (*, *)} 940 {(2, 25), (*, *)} 352 {(2, 113), (*, *)} 145 {(2, well-known), (*, *)} 114 {(2, not well-known), (*, *)} 6050 Intra-connecton {(2, *), (2, well-known)} 190 {(2, *), (2, not well-known)} 189 5 IMPLEMENTATION Ths secton descrbes the detaled mplementaton of the negatve selecton algorthm wth nchng that s proposed n ths work. It ntroduces the genotype and phenotype representaton, the genetc operators and fnally the ftness functons whch are based on the smlarty between a detector pattern and a self pattern. 5.1 GENOTYPES AND PHENOTYPES In ths secton, the detals about handlng contnuous values sutable for genotype encodng, genotype representatons and mappng between genotypes and phenotypes are descrbed. 5.1.1 Dscretsaton As seen n secton 4, each network actvty profle has 35 felds. From these 35 felds, the values of 28 felds are contnuous and the values of the other 7 felds are dscrete. Specfcally, the contnuous values of 28 felds show a wde range of values. In order to handle ths varous and broad range of values, a smple dscretsaton algorthm s requred. There are many dscretsaton algorthms avalable (Fretas, 1997). Most of these algorthms requre long processng tmes. To make a system to report the occurrence of ntrusons mmedately, a smple dscretsaton algorthm that requres less computng tme s used n ths work. Ths algorthm conssts of two steps. In the frst step, an overall range of real values for each feld s sorted. In the second step, accordng to a gven total cluster number, whch s a varable, the number of records for each cluster s unformly determned. In other words, the lower bound and hgher bound of each cluster are determned by ensurng that each cluster contans the same number of records Ths smple algorthm provdes three dfferent types of dscretsaton: non-overlappng, mnmum overlappng and maxmum overlappng. Because each cluster boundary s defned smply based on an dentcal number of data ponts, the crsp boundary defned by ths smple method may be not very relable. Therefore, we expect ths algorthm to correct ths unrelablty by defnng fuzzy boundares. To be more precse, let F be a contnuous feld of a self profle S such that F = { f1, f 2,... f,... fl } where f f + 1. Here, F has l values and ths number s equal to a total number of gathered records n S. After performng dscretsaton, F has m clusters. Ths means that F becomes F = { 1, 2,...,... m } where s a cluster such that =...... } where {, 2, k, 1 n n s the number of ponts wthn the cluster n = l / m, k k and [ ] + 1 k f1, fl. For two adacent clusters and such that =...... } where k f f ], {, 2, k, 1 n [ 1, l =...... } where k f f ], {, 2, k, = +1, 1 n [ 1, l the relatons of ponts n these two clusters become n n k < = = 1 1 k 1...(1)...(2) where k [2,n ],...(3) Here, (1) represents when non-overlappng dscretsaton of nteger values or real values s performed, (2) shows when mnmum overlappng dscretsaton of nteger

values s performed and (3) descrbes when maxmum overlappng dscretsaton of nteger values s performed. For mnmum and maxmum overlappng dscretsaton of real values, let P r be a gven overlappng range proporton such that 0 P r 0. 5 and N r s a number of overlappng ponts. Then, N r = * P n r where s the n number of ponts n cluster. The relatons of ponts n two adacent clusters and become k k = = 1 1 where where k = n N r...(4) k = N......(5) Here, n s the number of ponts wthn a cluster. (4) shows when mnmum overlappng dscretsaton of real values s performed and (5) represents when maxmum overlappng dscretsaton of real values s performed. In summary, the non-overlappng dscretsaton performs crsp clusterng by followng the orgnal dea descrbed above. The mnmum overlappng dscretsaton for nteger values allows neghbour clusters to overlap by the mnmum of one element. For maxmum overlappng dscretsaton for nteger values, adacent clusters overlap all the ponts apart from one element. For real values, an overlappng proporton s predefned and t cannot be more than 50%. The mnmum overlappng dscretsaton allows neghbour clusters to overlap by a predefned proporton whle the maxmum overlappng permts adacent neghbours to overlap the remanders of predefned overlappng proporton. 5.1.2 Encodng Even though the smple dscretsaton algorthm uses fuzzy boundares among clusters, t s stll not certan whether each cluster s formed wthout serous nformaton loss. As the result, the negatve selecton wth nchng used n ths work lets each cluster evolve. Instead of usng a fxed boundary for each cluster, the lower bound and hgher bound of each cluster adaptvely evolve. In order to make ths possble, genotypes are carefully desgned as follows. Genotypes consst of 35 genes where each gene represents each feld of a detector. As descrbed n secton 4.2.1, the profle bult for ths work has 35 felds and ths number determnes the total number of correspondng genes n the detectors. Each gene comprses cluster number, mnmum offset number and maxmum offset number. As seen n fgure 3, each nucleotde has an alphabet of cardnalty 10 wth values from 0 to 9. For example, the gene g1 ndcates the frst feld of a profle, the number of packets sent by an ntator. Ths feld has nteger values and so these values were dscretsed nto n clusters based on a predefned cluster number. When the dscretsaton s performed, a cluster table s generated. It contans ntervals of clusters ndexed by ascendng sequental numbers. Thus, each feld n a self profle has ts correspondng cluster table and the frst nucleotde of r the correspondng gene represents the cluster number stored n the cluster table. Accordng to the purpose of cluster evoluton, an offset table s created along wth a cluster table. As shown n fgure 3, ths table has two columns: offset ID and offset pont. The offset pont shows the actual value to whch a lower bound and a hgher bound of each cluster can be moved. For nstance, n fgure 3, the second nucleotde of gene1, 1 corresponds to the offset ID 1 n the offset table. The offset pont of ths offset ID 1 shows 3. Thus, a lower bound of cluster 2, whch s a cluster number shown n the frst nucleotde, changes from the orgnal 6 to 3. Smlarly, the orgnal hgher bound 10 of cluster 2 s changed nto 15 by followng the offset pont of offset ID 3 n the thrd nucleotde. DETETOR gene 1 gene 2 gene 35 2 1 3 0 0 1 6 2 4 1 1 cluster and offset tables luster table 1 Offset table 1 ID Interval ID Offset pont 1 [1..5] 1 3 2 [6..10] 2 8 3 [11..20] 3 15 9 [60..78] 9 69 0 [79..250] 0 140 Fgure 3: Detector Genotypes In ths case, all the offset ponts of an offset table are dynamcally determned consderng a varance of ponts n each cluster. The dscretsaton algorthm used n ths work creates clusters that contan an dentcal number of ponts. Ths leads to the nterval szes of each cluster calculated by hgher bound lower bound to be varous. Offset ponts are determned by consderng these varous nterval szes. To be more precse, when a total number of offset ponts T o s gven, let N be a number of offset ponts defned wthn each cluster, then N s defned as N = To / m where m s a number of clusters. Let O be a set contanng offset ponts belong to a cluster, then accordng to the calculated number of offset ponts N wthn a cluster, O s defned as O = { O,..., O..., O } where ( )/( N + 1) 1 N O = O + n, [ 1, N ] 1 1

and O = 0 1 Here, 1 s the lower bound and n s the hgher bound of a cluster. Therefore, as s shown n fgure 3, offset ponts are dynamcally defned dependng on a varance of ponts n an ndvdual cluster. However, ths method permts both a lower bound and a hgher bound of each cluster to move to any offset pont. onsequently, t can cause a lower bound of a cluster to be larger than ts hgher bound. In ths case, these two bounds are smply swapped. In ths dynamc genotype, the granularty of dscretsaton s tuned accordng to a gven total number of clusters and offset ponts. As these numbers are set wth larger numbers, genotypes are expected to have more specfc clusters. Smlarly, the nucleotdes n the genes whch defne the other two offset numbers can be extended to two nucleotdes or even more. Snce the total number of clusters and offset ponts are gven before startng a negatve selecton algorthm, a whole chromosome stll conssts of a fxed length of nucleotdes, each havng an alphabet of cardnalty 10. Fnally, for a nomnal type of feld, such as the well-known source port, whch ndcates whether a gven source port s well known or not, a cluster nterval of a cluster table s defned smply by the meanng of each group and an offset table s not generated. 5.1.3 Mappng Whle generaton of detectors and self-profles and applcaton of genetc operators are performed at the genotype level, measurement the smlarty between a selected detector pattern and self pattern s operated at the phenotype level. Ths s another dfference between most work usng a negatve selecton algorthm (Forrest, Hofmeyr, and Somaya, 1997), (Dasgupta, 1998). Such work usually performed ths evaluaton procedure on a genotype level usng smple r-contnuous bt matchng rule. In ths work, n order to measure the smlarty, genotypes for each generaton are mapped onto phenotypes. Therefore, step 4 of a negatve selecton wth nchng descrbed n secton 3.2 s extended as follows: 4.1 The genotype of each detector n the sample and s mapped onto phenotype accordng to nstructons n cluster tables and offset tables. 4.2 The degree of smlarty between the phenotype of the detector and the selected self s measured. The degree of smlarty measured at the phenotype level drectly determnes ftness values. More detals on the ftness functon are descrbed n the next secton. 5.2 FITNESS FUNTIONS Phenotypes mapped from evolved genotypes are represented n a form of detector patterns. As shown n fgure 4, a feld of a detector phenotype s represented by an nterval havng a lower bound and a hgher bound whle a feld of a self phenotype s descrbed by one specfc value. Hence, the frst step of measurng the smlarty checks whether a value of each feld of a self pattern belongs to a correspondng nterval of a detector phenotype. When any value of a self pattern feld s not ncluded n ts correspondng nterval of a detector phenotype, these two felds are not matched. In ths case, the degree of smlarty s measured by the dstance from the value of a self pattern feld to the closer value out of the lower bound and hgher bound. These two bounds comprse an nterval of the correspondng feld value of the detector pattern. After assgnng ths dstance as a smlarty score of an ndvdual feld of the detector pattern, a total smlarty score of a gven detector pattern s calculated by summng all these ndvdual smlarty scores. It should be noted that before summng them up, each score must be normalsed. Detector Phenontype = ( Number of Packet = [10, 26], Duraton = [0.3, 0.85], Termnaton = `half closed, etc) Self Phenotype = ( Number of Packet = 35, Duraton = 0.37, Termnaton = `normal,.etc) Fgure 4: A Detector Phenotype and a Self Phenotype Fnally, for a nomnal type of feld, when two felds are not matched, a maxmum smlarty score that can be gven to an ndvdual feld s assgned to ths nomnal feld. Ths s because a nomnal type of feld does not have nformaton about the order between dfferent clusters. Hence, a unform smlarty score s gven for any unmatchng case. 5.3 GENETI OPERATORS As ntroduced n secton 3.2, the negatve selecton algorthm wth nchng presented n ths work apples two genetc operators, crossover and mutaton. Snce a fxed number of nuclotdes wth a cardnalty of 10 represents a genotype, a smple one-pont crossover s appled by selectng a random crossover pont. A random mutaton s also appled wth a low probablty. 6 ONLUSIONS Ths paper has descrbed the use of negatve selecton algorthm wth nchng for network ntruson detecton. After an exstng negatve selecton algorthm was analysed, ths paper proposed a modfed negatve selecton algorthm wth nchng and the antcpated advantages of ths modfed approach for network ntruson detecton were dscussed. Based on these studes, real network packet data used for ths work were

ntroduced. Fnally, ths paper outlned a number of novel mplementaton aspects ncludng a novel genotype allowng the evoluton of clusters together wth the evoluton of detectors and a ftness functon whch evaluates at the phenotype level. Based on the mplementaton detals ntroduced n ths paper, varous experments are currently beng performed. In partcular, the experments focus on the nvestgaton of the complexty of computaton and the mantenance of generalty and dversty of evolved detectors. Acknowledgements Ths work has been partally supported by the Korea Internatonal ollaboraton Research Funds (I-03-02), the Mnstry of Scence and Technology, Korea. References D haeseleer, P, 1997, A Dstrbuted Approach to Anomaly Detecton, AM Transactons on Informaton System Securty. http://www.cs.unm.edu/~patrk/ Dasgupta, D., 1998, An Overvew of Artfcal Immune Systems and Ther Applcatons, In Dasgupta, D. (edtor). Artfcal Immune Systems and Ther Applcatons, Berln: Sprnger-Verlag. Pages 3-21. Forrest, S. et al, 1993, Usng Genetc Algorthms to Explore Pattern Recognton n the Immune System, Evolutonary omputaton, 1(3), 191-211. Forrest, S. et al, 1994, Self-Nonself Dscrmnaton n a omputer, Proceedng of 1994 IEEE Symposum on Research n Securty and Prvacy, Los Alamos, A: IEEE omputer Socety Press, 1994. http://www.cs.unm.edu/~forrest/papers.html Forrest, S., Hofmeyr, S., and Somaya, A., 1997, omputer Immunology, ommuncatons of the AM, 40(10), 88-96. Fretas, A. 1997, Generc, Set-Orented Prmtves to Support Data-Parallel Knowledge Dscovery n Relatonal Database Systems. Ph.D. Thess, Unversty of Essex, UK, July. Smth, R. E., Forrest, S., and Perelson, A. S., 1993, Searchng for Dverse, ooperatve Populatons wth Genetc Algorthms, Evolutonary omputaton, 1(2), 127-149. Tzard, I. R., 1995, Immunology: Introducton, 4 th Ed, Saunders ollege Publshng. Km, J. and Bentley, P., 1999a, The Human Immune System and Network Intruson Detecton, 7th European onference on Intellgent Technques and Soft omputng (EUFIT 99), Aachen, Germany (to appear). Km, J. and Bentley, P., 1999b, The Artfcal Immune Model for Network Intruson Detecton, 7th European onference on Intellgent Technques and Soft omputng (EUFIT 99), Aachen, Germany (to appear). Mykeree, B., et al, 1994. Network Intruson Detecton, IEEE Network, 8(3), 26-41. Porras, P. A. and Valdes, A., 1998, Lve Traffc Analyss of TP/IP Gateways, Proceedng of ISO Symposum of Network and Dstrbuted System Securty. http://www.csl.sr.com/emerald/downloads.html