A Hierarchical Anomaly Network Intrusion Detection System using Neural Network Classification

IDC IDC A Herarchcal Anomaly Network Intruson Detecton System usng Neural Network Classfcaton ZHENG ZHANG, JUN LI, C. N. MANIKOPOULOS, JAY JORGENSON and JOSE UCLES ECE Department, New Jersey Inst. of Tech., Unversty Heghts, Newark, NJ 0702, USA Department of Mathematcs, CUNY, Convent Ave. at 38 ST., New York, NY 0003, USA Network Securty Solutons, 5 Independence Blvd. 3 rd FL., Warren, NJ 07059, USA Abstract: - In ths paper, we ntroduce a herarchcal anomaly network ntruson detecton system, whch s capable of detectng network based attacks usng statstcal preprocessng models and neural network classfcaton. The sample network used has a three-ter herarchy, where the lower ter detectors report to the hgher ters. The statstcal preprocessor converts network traffc sample nformaton nto a PDF that s compared to a hstorcally developed PDF for correspondng normal network traffc, thus dervng a statstcal smlarty decson vector that the neural network classfes nto anomalous (attack) or normal nstance. Several smulaton experments have been carred out focusng on the Denal of Servce attack. We used the Perceptron-Backpropagaton-Hybrd (PBH) as the neural net classfer, whch showed fast convergence (only a few epochs needed) and a small number of weghts. The classfcaton results are characterzed by low msclassfcaton error rates, for both false postves and false negatves. Key-Words: - Securty, Intruson Detecton, Statstcal Preprocessng, Neural Network Classfcaton, Perceptron- Backpropagaton-Hybrd, PBH, Anomaly Detecton Introducton The basc assumpton of ntruson detecton s that an ntruder's behavor wll be notceably dfferent from that of legtmate users. Most ntruson detecton systems are developed along two complementary trends: msuse detecton, and anomaly detecton. Msuse detecton systems, such as [][2], search evdence of attacks based on the knowledge accumulated from known attacks and securty gaps. Anomaly detecton systems, such as [3][4][8], dentfy ntrusons by observng a devaton from normal or expected behavor of the systems or users. Many technologes have been developed to detect possble attacks. For example, NIDES [3] represents user or system behavors by a set of statstcal varables and detects the devaton between the observed and the standard actvtes. JAM [2] uses data mnng approaches to extract features of attackers and normal users. A system, whch dentfes ntrusons usng packet flterng and neural networks, s ntroduced n [5]. Ths paper presents the prototype of a herarchcal anomaly network ntruson detecton system that uses statstcal models and neural networks to detect attacks. Secton 2 descrbes the detals of the system archtecture, the statstcal models and the neural networks used n the system. Secton 3 ntroduces the test bed and the attack schemes we smulated. Some expermental results are also reported n that secton. Secton 4 draws some conclusons and outlnes future work. 2 System Archtecture Our system s a dstrbuted herarchcal applcaton, whch conssts of several ters whle each ter s composed by several agents. Agents are IDS components that montor the actvtes of a host or a network. Dfferent ters correspond to dfferent network scopes that ther agents protect. Department Securty Department ID Montor Ethernet Server ID Montor Brdge Ethernet Brdge Swtch Router Brdge Department Ethernet ID Montor Frewall Fg. Sample Network Server Internet

IDC ID C For a sample network gven n Fg., the ntruson detecton system can be dvded nto 3 ters. Ter agents montor system actvtes of the servers and brdges wthn a department and perodcally generate reports for Ter 2 agents. Ter 2 agents detect the network status of a departmental LAN based on the network traffc that they observe as well as the reports for the Ter agents wthn the LAN. Ter 3 agents collect data from the Ter agents at the frewall and the router as well as data of Ter 2 agents at the departmental LANs. A system herarchy s shown n Fg. 2. To Hgher Ter Post Processor Neural Networks Statstcal Model To User Interface Ter 3 Securty Department Probe Event Preprocessor ID Montor Network Traffc Reports from IDAs of lower ters Ter 2 Ter ID Montor Server Brdge Department ID Montor ServerBrdge Department 2 Router Frewall Fg. 2 System Herarchy Subsequent subsectons are organzed as follows: subsecton 2. ntroduces the structure of Intruson Detecton Agents (IDA); subsecton 2.2 descrbes the statstcal model of IDA; and secton 2.3 dscusses the neural networks used n ths system. 2. Intruson Detecton Agent (IDA) Because ths system s dstrbuted and herarchcal, the IDAs of all ters have the same structure. A dagram of IDA s llustrated n Fg. 3. An IDA conssts of followng components: the probe, the event preprocessor, the statstcal model, the neural networks and the post processor. The functonaltes of these components are descrbed as below: Fg. 3 Intruson Detecton Agent Probe: collects the network traffc of a host or a network, abstracts the traffc nto a set of statstcal varables to reflect the network status, and perodcally generates reports to the event preprocessor. Event Preprocessor: receves reports from both the probe and IDAs of lower ters, and converts them nto the format requred by the statstcal model. Statstcal Model: mantans a reference model of the normal network actvtes, compares the reference model wth the reports from the event preprocessor, and forms a stmulus vector to feed nto the neural networks. We wll further dscuss the statstcal algorthms n subsecton 2.2. Neural Networks: analyzes the stmulus vector from the statstcal model to decde whether the network traffc s normal or not. Subsecton 2.3 wll ntroduce the neural networks used n the system n detal. Post Processor: generates reports for the agents at hgher ters. At the same tme, t wll dsplay the detected results through a user nterface. 2.2 Statstcal Model Statstcs have been used n anomaly ntruson detecton systems [3][4], however most of these systems smply measure the means and the varances of some varables and detect whether certan thresholds are exceeded. SRI s NIDES [6][3] developed a more sophstcated statstcal algorthm by usng χ 2 -lke test to measure the smlarty between short-term and long-term profles. Our current

statstcal model uses a smlar algorthm as NIDES but wth major modfcatons. Therefore, we wll frst brefly ntroduce some basc nformaton of the NIDES statstcal algorthm. In NIDES, user profles are represented by a number of probablty densty functons. Let S be the sample space of a random varable and events E E,..., E, 2 k a mutually exclusve partton of S. Assume that p s the expected probabltes of the occurrence of events E, and that p represents the actual probablty of the ' occurrences of E durng a tme nterval, and that N s the total number of occurrences. NIDES statstcal algorthm used χ 2 -lke test to determne the smlarty between the expected and actual dstrbutons wth equaton as below: k ' 2 ( p p ) Q = N p = When N s large and the events E,...,, E2 Ek are ndependently, Q approxmately follows the χ 2 dstrbuton wth ( k ) degrees of freedom. However n real applcatons the above two assumptons generally cannot be guaranteed, thus emprcally Q may not follows χ 2 dstrbutons. NIDES solved ths problem by buldng an emprcal probablty dstrbuton for Q whch s updated daly n real-tme operaton. In our system, snce we are usng neural networks to dentfy possble ntrusons, we are not so concerned wth the actual dstrbuton of Q. However, because network traffc s not statonary and network-based attacks may have dfferent tme duratons, varyng from a couple of seconds to several hours, we need an algorthm, whch s capable of effcently montorng network traffc wth dfferent tme wndows. Based on the above observatons, we used a layer-wndow statstcal model, Fg. 4, wth each layer-wndow correspondng to one dfferent detecton tme slce. The newly arrved events wll frst be stored n the event buffer of layer. The stored events wll be compared wth the reference model of that layer and the results are fed nto neural networks to detect the network status durng that tme wndow. The event buffer wll be empted once t becomes full, and the stored events wll be averaged and forwarded to the event buffer of layer 2. The same process wll be repeated recursvely untl t arrves at the top level where the events wll smply be dropped after processng. Layer-Wndow M Layer-Wndow 2 Layer-Wndow Event Buffer... Event Buffer Event Buffer Event Report Fg. 4 Statstcal Model Reference Model Reference Model Reference Model The smlarty-measurng algorthm that we are usng s shown below: Q = f ( N).[ k = p p + ' k max = ( p ' p where f (N) s a functon that takes nto account the total number of occurrences durng a tme wndow. Besdes smlarty measurements, we also desgned an algorthm for the real-tme updatng of the reference model. Let p old be the reference model before updatng, p new be the reference model after updatng, and p obs be the observed user actvty wth a tme wndow. The formula to update the reference model s p new = s α p obs + ( s α) p old n whch α s the predefned adaptaton rate and s s the value generated by the output of the neural network. Assume that the output of the neural network t s a contnuous varable between and, where means ntruson wth absolute certanty and means no ntruson agan wth complete confdence. In between, the values of t ndcate proportonal levels of certanty. The functon of calculatng s s t, f t 0 s = 0, otherwse Through the above equatons, we ensured that the reference model would be updated actvely for normal traffc whle kept unchanged when attacks occurred. The attack events wll be dverted and stored, for us as attack scrpts, n neural network learnng. )]

2.3 Neural Networks The neural networks are wdely consdered as an effcent approach to adaptvely classfy patterns, but the hgh computaton ntensty and the long tranng cycles greatly hndered ther applcatons. In [5][8], BP neural networks were used to detect anomalous user actvtes. BP networks are excellent n fndng out the nonlnear correlatons between nputs and outputs, but the large number of hdden neurons makes the archtecture computatonally neffcent. In our applcaton, we beleve both lnear and nonlnear correlatons exst between the stmulus vectors and the output, therefore we employed a hybrd neural network paradgm [7], called perceptronbackpropagaton-hybrd (or PBH) network, whch s a superposton of a perceptron and a small backpropagaton network. A dagram of the PBH archtecture s llustrated n Fg. 5. smulaton specfcatons wll be ntroduced n subsecton 3., and then subsecton 3.2 reports the testng results. 3. Testbed We used a vrtual network usng smulaton tools to generate attack scenaros. The expermental testbed that we bult usng OPNET, a powerful network smulaton faclty, s shown n Fg. 6. The testbed conssts of 3 0BaseX LANs, nterconnected by 2 routers. Output H O H 2 Fg. 6 Smulaton Testbed We smulated the udp floodng attack wthn the testbed. To extensvely test our system, we ran two ndependent scenaros wth dfferent traffc loads and characterstcs. Table lsted the traffc loads of the two smulaton scenaros. I I 2 I n- I n Inputs Fg. 5 PBH archtecture In our experments, we used PBH networks wth 4 hdden neurons. As we wll see n the next secton, the performance of these neural networks was performed very effcently. 3 Expermental Results In ths secton, we wll present our smulaton approach and the results n applyng our statstcal models and the PBH neural network to detect networkbased attacks. Frst the testbed confguraton and the TCP background traffc (Mbps) UDP background traffc (Mbps) Attack traffc (Mbps) Scenaro Scenaro 2.05.05.08 6.82.8.8 Table Traffc Loads of Tow Smulato Scenaros 3.2 Results For each smulaton scenaro, we collected 0,000 records of networks traffc. We evenly dvded these data nto two separate sets, one for tranng and the other for testng. In each scenaro, the system was traned for 50 epochs. The mean squared root errors of the outputs of the two scenaros are shown n Fg. 7 and Fg. 8. From the graphs, we can see that the MSR errors of both

scenaros decrease very fast after only the frst few epochs, reachng satsfactory convergence levels wthn the frst ten epochs or so. As the tranng contnues, the MSR errors of Scenaro and Scenaro 2 approach to 0.005 and 0.05 respectvely. Fg. 9 Error Probabltes of Scenaro Fg. 7 MSR Error of Scenaro Fg. 0 Error Probabltes of Scenaro 2 Fg. 8 MSR Error of Scenaro 2 The msclassfcaton probabltes of the outputs of the two scenaros are shown n Fg. 9 and Fg. 0, n whch we calculated the false-postve possbltes,.e., the probabltes of classfyng normal traffc as ntruson, and the false-negatve probabltes, that s the probabltes of falng to dentfy ntruson, as well as the overall msclassfcaton probabltes, whch are the sum of both false-postve and false-negatve probabltes. These graphs show smlar trends as Fgs. 7 and 8 for MSR. From the smulaton results, we can see that, n both scenaros, the system converged very fast, wthn several epochs, wth hgh accuracy. Theses features are especally desrable for network ntruson detecton systems, whch need real-tmely montorng and onlne tranng. 4 Conclusons In ths paper, we descrbed our desgn of a herarchcal network ntruson detecton system. We dscussed the system herarchy, the statstcal preprocessng modules and the neural network classfer. We then dscussed the smulaton experments that we carred out for the Denal of Servce attack scrpt. Thu results showed fast convergence for the neural net classfer and low msclassfcaton rates. Thus, these experments showed that the proposed approach s effectve and bears much promse.

Acknowledgements Our research was partally supported by a Phase I SBIR contract wth US Army. We would also lke to thank OPNET Technologes, Inc. TM, for provdng the OPNET smulaton software. Reference: [] Govann Vgna, Rchard A. Kemmerer, NetSTAT: a network-based Intruson Detecton Approach, Proceedngs of 4 th Annual Computer Securty Applcatons Conference, 998, pp. 25 34. [2] W. Lee, S. J. Stolfo, K. Mok, A Data Mnng Framework for Buldng Intruson Detecton Models, Proceedngs of 999 IEEE Symposum of Securty and Prvacy, pp. 20-32. [3] A. Valdes, D. Anderson, Statstcal Methods for Computer Usage Anomaly Detecton Usng NIDES, Techncal report, SRI Internatonal, January 995. [4] Terran lane, Carla E. Brodley, Temporal Sequence Learnng and Data Reducton for anomaly Detecton, Vol. 2, No. 3, August 999, pp. 295-33. [5] J. M. Bonfaco, et al., Neural Networks Appled n Intruson Detecton System, IEEE, 998, pp. 205-20 [6] H. S. Javtz, A. Valdes, the NIDES Statstcal Component: Descrpton and Justfcaton, Techncal report, SRI Internatonal, March 993. [7] R. M. Dllon, C. N. Mankopoulos, Neural Net Nolnear Predcton for Speech Data, IEEE Electroncs Letters, Vol. 27, Issue 0, May 99, pp. 824-826. [8] A.K. Ghosh, J. Wanken, F. Charron, Detectng Anomalous and Unknown Intrusons Aganst Programs, Proceedngs of IEEE 4th Annual Computer Securty Applcatons Conference, 998 pp. 259-267