Statistical Intrusion Detector with Instance-Based Learning



Similar documents
Statistical Pattern Recognition (CE-725) Department of Computer Engineering Sharif University of Technology

IDENTIFICATION OF THE DYNAMICS OF THE GOOGLE S RANKING ALGORITHM. A. Khaki Sedigh, Mehdi Roudaki

6.7 Network analysis Introduction. References - Network analysis. Topological analysis

ANOVA Notes Page 1. Analysis of Variance for a One-Way Classification of Data

Speeding up k-means Clustering by Bootstrap Averaging

APPENDIX III THE ENVELOPE PROPERTY

Numerical Methods with MS Excel

On Error Detection with Block Codes

Maintenance Scheduling of Distribution System with Optimal Economy and Reliability

CHAPTER 2. Time Value of Money 6-1

1. The Time Value of Money

ADAPTATION OF SHAPIRO-WILK TEST TO THE CASE OF KNOWN MEAN

Settlement Prediction by Spatial-temporal Random Process

Chapter Eight. f : R R

Average Price Ratios

A Bayesian Networks in Intrusion Detection Systems

An Effectiveness of Integrated Portfolio in Bancassurance

ECONOMIC CHOICE OF OPTIMUM FEEDER CABLE CONSIDERING RISK ANALYSIS. University of Brasilia (UnB) and The Brazilian Regulatory Agency (ANEEL), Brazil

Abraham Zaks. Technion I.I.T. Haifa ISRAEL. and. University of Haifa, Haifa ISRAEL. Abstract

SHAPIRO-WILK TEST FOR NORMALITY WITH KNOWN MEAN

A New Bayesian Network Method for Computing Bottom Event's Structural Importance Degree using Jointree

The Gompertz-Makeham distribution. Fredrik Norström. Supervisor: Yuri Belyaev

Preprocess a planar map S. Given a query point p, report the face of S containing p. Goal: O(n)-size data structure that enables O(log n) query time.

An IG-RS-SVM classifier for analyzing reviews of E-commerce product

Applications of Support Vector Machine Based on Boolean Kernel to Spam Filtering

Simple Linear Regression

Credibility Premium Calculation in Motor Third-Party Liability Insurance

T = 1/freq, T = 2/freq, T = i/freq, T = n (number of cash flows = freq n) are :

A Novel Method in Scam Detection and Prevention using Data Mining Approaches

Bayesian Network Representation

An Approach to Evaluating the Computer Network Security with Hesitant Fuzzy Information

The simple linear Regression Model

n. We know that the sum of squares of p independent standard normal variables has a chi square distribution with p degrees of freedom.

The Digital Signature Scheme MQQ-SIG

STATISTICAL PROPERTIES OF LEAST SQUARES ESTIMATORS. x, where. = y - ˆ " 1

A Study of Unrelated Parallel-Machine Scheduling with Deteriorating Maintenance Activities to Minimize the Total Completion Time

A DISTRIBUTED REPUTATION BROKER FRAMEWORK FOR WEB SERVICE APPLICATIONS

The analysis of annuities relies on the formula for geometric sums: r k = rn+1 1 r 1. (2.1) k=0

10.5 Future Value and Present Value of a General Annuity Due

Online Appendix: Measured Aggregate Gains from International Trade

Green Master based on MapReduce Cluster

Optimal multi-degree reduction of Bézier curves with constraints of endpoints continuity

RUSSIAN ROULETTE AND PARTICLE SPLITTING

Proactive Detection of DDoS Attacks Utilizing k-nn Classifier in an Anti-DDos Framework

Chapter = 3000 ( ( 1 ) Present Value of an Annuity. Section 4 Present Value of an Annuity; Amortization

Relaxation Methods for Iterative Solution to Linear Systems of Equations

Banking (Early Repayment of Housing Loans) Order,

Optimal replacement and overhaul decisions with imperfect maintenance and warranty contracts

Chapter 3. AMORTIZATION OF LOAN. SINKING FUNDS R =

Robust Realtime Face Recognition And Tracking System

The Analysis of Development of Insurance Contract Premiums of General Liability Insurance in the Business Insurance Risk

ANALYTICAL MODEL FOR TCP FILE TRANSFERS OVER UMTS. Janne Peisa Ericsson Research Jorvas, Finland. Michael Meyer Ericsson Research, Germany

Fractal-Structured Karatsuba`s Algorithm for Binary Field Multiplication: FK

Analysis of one-dimensional consolidation of soft soils with non-darcian flow caused by non-newtonian liquid

A particle Swarm Optimization-based Framework for Agile Software Effort Estimation

Dynamic Two-phase Truncated Rayleigh Model for Release Date Prediction of Software

Report 52 Fixed Maturity EUR Industrial Bond Funds

IP Network Topology Link Prediction Based on Improved Local Information Similarity Algorithm

Fast, Secure Encryption for Indexing in a Column-Oriented DBMS

A Single Machine Scheduling with Periodic Maintenance

CIS603 - Artificial Intelligence. Logistic regression. (some material adopted from notes by M. Hauskrecht) CIS603 - AI. Supervised learning

where p is the centroid of the neighbors of p. Consider the eigenvector problem

Group Nearest Neighbor Queries

Near Neighbor Distribution in Sets of Fractal Nature

Performance Attribution. Methodology Overview

MDM 4U PRACTICE EXAMINATION

On formula to compute primes and the n th prime

DIGITAL AUDIO WATERMARKING: SURVEY

Forecasting Trend and Stock Price with Adaptive Extended Kalman Filter Data Fusion

Network dimensioning for elastic traffic based on flow-level QoS

USEFULNESS OF BOOTSTRAPPING IN PORTFOLIO MANAGEMENT

Optimizing Software Effort Estimation Models Using Firefly Algorithm

Cyber Journals: Multidisciplinary Journals in Science and Technology, Journal of Selected Areas in Telecommunications (JSAT), January Edition, 2011

Regression Analysis. 1. Introduction

Security Analysis of RAPP: An RFID Authentication Protocol based on Permutation

Efficient Traceback of DoS Attacks using Small Worlds in MANET

A particle swarm optimization to vehicle routing problem with fuzzy demands

Loss Distribution Generation in Credit Portfolio Modeling

Finito: A Faster, Permutable Incremental Gradient Method for Big Data Problems

Models of migration. Frans Willekens. Colorado Conference on the Estimation of Migration September 2004

A probabilistic part-of-speech tagger for Swedish

Suspicious Transaction Detection for Anti-Money Laundering

Reinsurance and the distribution of term insurance claims

How To Make A Supply Chain System Work

Projection model for Computer Network Security Evaluation with interval-valued intuitionistic fuzzy information. Qingxiang Li

STOCHASTIC approximation algorithms have several

Classic Problems at a Glance using the TVM Solver

An Evaluation of Naïve Bayesian Anti-Spam Filtering Techniques

RQM: A new rate-based active queue management algorithm

Learning to Filter Spam A Comparison of a Naive Bayesian and a Memory-Based Approach 1

A COMPARATIVE STUDY BETWEEN POLYCLASS AND MULTICLASS LANGUAGE MODELS

A Comparative Study for Classification

Introduction to Maintainability

Integrating Production Scheduling and Maintenance: Practical Implications

Optimization Model in Human Resource Management for Job Allocation in ICT Project

Study on prediction of network security situation based on fuzzy neutral network

Three Dimensional Interpolation of Video Signals

Research on Cloud Computing and Its Application in Big Data Processing of Railway Passenger Flow

How To Value An Annuity

Approximation Algorithms for Scheduling with Rejection on Two Unrelated Parallel Machines

Transcription:

Iformatca 5 (00) xxx yyy Statstcal Itruso Detector wth Istace-Based Learg Iva Verdo, Boja Nova Faulteta za eletroteho raualštvo Uverza v Marboru Smetaova 7, 000 Marbor, Sloveja va.verdo@sol.et eywords: truso detecto, stace-based learg, reducto techques Receved: [Eter date] I ths paper we are dealg wth computer securty ssues. I ths very broad area, we focused o truso detecto, specfcally, o statstcal detecto. Our statstcal truso detector, preseted the paper, s based o Istace-based Learg wth the -earest Neghbours method. Statstcal detector requres a good ad small database of regular data to be able to valdate the actual traffc correctly ad promptly. Therefore we cosdered reducto techques of gathered data, based o clusterg. We adjusted the -earest Neghbours algorthm by comparg a sequece of actual data wth sequeces of regular data stead of comparg oly oe actual stace wth -earest regular staces.for ths purpose we explored four smlarty measure fuctos. Fally, our securty soluto VAL (Varost ALarm), cosstg of our statstcal detector, a SNORT rule-based truso detecto system, a ptables Lux frewall ad a maagemet cosole, s preseted. Itroducto I addto to the great opportutes ad beeft for mad, the emergece of global etworg the prevous decade has brought also serous securty threats to ts users. Itruso detecto ca be regarded as a tool that ca mprove the securty of local etwor ad/or dvdual hosts. Itruso Detecto Systems (IDS) ca prevet uauthorzed access to system resources ad data ad catch the attacer at the act. There are two ma approaches to truso detecto [6]. These are: rule-based msuse detecto ad statstcal based aomaly detecto. Each of them has ts strog ad wea pots. Rule-based detectors are better for teral securty (by that we mea securty sde the compay traet). O the other had, the strogest pot of statstcal detectors s the detecto of ovel, prevously uow ds of attacs whle they are wea at teral securty. Therefore t s reasoable to combe a rule-based ad a statstcal detector to a hybrd detector. The latest tred s to bloc the attacer IP address wth a frewall from the truso detector. Such systems are called truso preveto systems. Our securty soluto VAL ecompasses all these features. Istace-based Learg Istace-Based Learg (IBL) algorthms cosst of smply storg the preseted trag examples as well as ther attrbute lsts ad ther outcome (database of regular data). Ad whe a ew stace s ecoutered, a set of smlar, related staces s retreved from the memory ad used to classfy the actual (ew) stace accordg to the outcome of the majorty of related trag staces []. Ths d of classfcato s called target fucto. The outcome s our case ether 0 - ormal actvty or - truso. The followg are the most commo IBL target fuctos: -Nearest Neghbor Locally Weghted Regresso Radal Bass Fucto

Ttle of the paper Iformatca 3 (999) xxx yyy IBL approaches ca costruct a dfferet approxmato of the target fucto for each dstct ew stace to be classfed. Some techques oly costruct a local approxmato of the target fucto that apples the eghborhood of the ew query stace ad ever costruct a approxmato desged to perform well over the etre stace space. Ths s a advatage whe the target fucto s very complex, but ca stll be descrbed by a collecto of less complex local approxmatos [].. -Nearest Neghbour The -Nearest Neghbor algorthm s the most basc of all Istace-Based Learg (IBL) methods. The algorthm assumes all staces correspod to pots the - dmesoal space R. The earest eghbors of a stace are defed terms of stadard Eucldea geometry (dstaces betwee pots -dmesoal space). More precsely, let a arbtrary stace x be descrbed by the feature attrbute lst: < a (x), a (x), a 3 (x),..., a (x)>, where a r (x) deotes the value of the r th attrbute of stace x. I our case attrbute lst of the staces cossts of TCP pacet header parameters. The most mportat parameters are: source ad destato IP addresses, source ad destato port umbers ad status of flags. The dstace betwee the two staces x ad x j [3] s gve by equato below. Ths s the geeral form for calculatg dstace -dmesoal space. d( x, x ) j r r [ a ( x ) a ( x )] r Equato : Euclda dstace betwee two staces wth attrbutes We do ot use ths dstace equato exactly sce we test oly the equalty betwee attrbutes. I earest-eghbor learg, the target fucto may be ether dscrete-valued or real-valued. The form of the dscrete-valued target fucto s f :R ->V, where V {v, v, r j v 3,..., v s } s a fte set ( our case V {regularty, truso}) ad R s real - dmesoal space. The -Nearest Neghbours algorthm for approxmatg a dscrete-valued target fucto [3] s gve algorthm below: Trag part: For For each example <x, <x, f(x)>, add the the example to to the the lst lst of of trag_examples Classfcato part: Gve a ew stace x q q to to be be classfed, Step : : Let Let x,, x,,......,, x deote the the staces from the the trag_examples that that are earest to x q, are earest to x f ˆ q, Step:Retur, ( xq ) arg max δ ( v, f ( x )) v V f ˆ ( xq ) arg max where: f ( a b) δ ( δ ( v, f ( x )) v V a, b) 0 f ( a b) where: f ( a b) δ ( a, b) 0 f ( a b) Algorthm : -earest Neghbours I the trag part we must collect trag examples staces. We collect ther attrbute lst as well as ther target. I our case staces are TCP pacets. We collected the mportat header parameters as a attrbute lst. I the classfcato part we frst search for the -earest staces from the trag examples closest to the actual stace (.e. to the ew stace). The we classfy ths stace accordg to the outcome of the majorty of earest trag staces. Our case s a bt specfc sce all our trag examples are cosdered to be regular,.e. all of them have oly oe outcome. Therefore ew stace s cosdered regular, f ts attrbutes are close eough to the attrbute lsts of trag examples. 3 VAL Statstcal Detector As prevously sad, our statstcal detector performs truso detecto usg adapted IBL wth -Nearest Neghbor method. We collected the trag examples by recordg

Ttle of the paper Iformatca 3 (999) xxx yyy 3 the TCP etwor actvty o the computer plugged to uversty departmet traet, for two wees. So collected trag examples were hghly redudat ad osy. To mprove the qualty ad to reduce the sze of gathered data we frst cosdered clusterg methods. Clusterg meas to partto data space to dsjot subsets so that the pots each subset are coheret accordg to a certa crtero. Our dea was to group TCP etwor pacets to sets of smlar pacets ad to preserve oly pacets the ceter of groups. The methods we have spected are: -Meas Mxture of Gaussa dstrbutos used by Expectato-Maxmzato Greedy Clusterg Algorthm 3. -Meas -Meas s oe of the smplest clusterg algorthms. It assumes that the clusters are sphercal, that every cluster has a ceter ad that other pots belogg to the cluster are close aroud the ceter [4]. See Algorthm below. Iput data pots { x, x,.., x} ; umber of clusters Output clust() for ; c, c,.., c postos of ceters Italze c, c,.., c wth radom values Do for.. fd such that x c x c' for all,.., clust() for,.., C { x, clust( ) } c x C C utl clust(),,.., rema uchaged Algorthm : -Meas clusterg We have put data pots ad clusters, whle the output s the assgmet of data pots to clusters ad postos of cluster ceters. Frst, cluster ceters are talzed wth radom values. The, a loop, the data pots are frst assged to the cluster wth the ceter earest to the data pot. I the ext step, the cluster ceters are recalculated from all the pots curretly the cluster. The loop terates utl classfcato of all the data pots to the clusters remas uchaged. The -Meas algorthm fals to fd the correct clusterg whe clusters have dfferet szes ad/or they have (dfferet) elogated shapes [4]. 3. Mxture of Gaussa dstrbutos Dfferet models have to be used for clusters that are t sphercal. Oe of them ca be a mxture of Gaussa dstrbutos. A mxture of Gaussa dstrbutos [4] s a probablty desty gve by f ( x) λ f ( x) where: f (x) are ormal destes wth parameters µ σ called the mxture compoets,, λ 0 are real umbers satsfyg λ, called mxture coeffcets. Itutvely, adoptg a mxture reflects the assumpto that there are sources whch depedetly geerate data ( f, f,.., f ). The probablty that data s geerated by f s λ. So ( λ, λ,.., λ ) represet a dscrete dstrbuto over the sources. The ew data pot s geerated two steps: the frst source f s radomly pced from ( f, f,.., f ) wth a probablty gve by ( λ, λ,.., λ ), the secod data pot x s sampled from chose f. We ow x, but we do t ow, the dex of the source that geerated x. Therefore s called the hdde varable [4]. f (x) ca be rewrtte to show the two-step data geerato model: f ( x) ) f ( x ) where: ) λ for,

Ttle of the paper Iformatca 3 (999) xxx yyy 4 f ( x ) f ( x) I ths probablstc framewor, the clusterg problem ca be traslated as follows. Fdg the clusters s equvalet to estmatg the destes of the data sources ( f, f,.., f ). Assgg the data to the clusters meas recoverg the values of the hdde varable for each data pot [4]. 3.3 Expectatos-Maxmzato The Expectato-Maxmzato (EM) algorthm [4][5] solves the clusterg problem as a Maxmum Lelhood estmato problem. It s based o mxture of the Gaussa dstrbutos. It taes the data D { x, x,.., x} ad the umber of clusters as the put ad outputs the model parameters Θ { λ,.., λ, µ,.., µ, σ,.., σ } ad the posteror probablty of the clusters for each data pot γ (), for,,,... For ay gve set of model parameters Θ, we compute the probablty P ( x ) that observato x was geerated by the -th source f usg the Bayes formula x ) ) f ( x ) )' f ( x )' ' ' λ f ( x ) γ ( ) λ f ( x ) ' ' The values γ (),,.., sum to. They are called the partal assgmets of pot x to the -clusters - see Algorthm 3. It ca be proved that the EM algorthm coverges. The parameters Θ obtaed at covergece represet a local maxmum of the lelhood L(Θ). The complexty of each terato s O(). Clusterg methods based o EM are popular because they are geeral ad ofte hghly effectve. However whe may local optma are preset the lelhood space the qualty of the soluto produced ca be sestve to the tal assgmet of pots to clusters. A larger dffculty for the aomaly detecto doma s that, the umber of clusters to be sought must be ow a pror, yet t s ot clear how to determe the umber of atural clusters a set of etwor pacets wth ther parameters. Furthermore, for large search tme ca be prohbtve []. Iput { x, x,.., x} the data pots, the umber of clusters Output γ () for,..,,.., µ, σ for, the parameters of the mxture compoets λ for,, the mxture coeffcets Italze µ, σ, λ for,.., wth radom values Do E step for,..., λ f ( x ) γ ( ) for, λ f ( x M step for, ' γ ( ) ' ' ) λ µ γ ( ) x σ utl covergece Algorthm 3: Expectatos-Maxmzato 3.4 Greedy Clusterg Algorthm Greedy clusterg algorthm [] bulds dvdual clusters cosecutvely attemptg to mmze the crtero: Dst( x, y) x C y C val( C) C for each cluster C. Begg wth the tal pot, the cluster grows by cludg pots, whch creases val(c) the least. Growth s stopped whe the value reaches a local mmum. Whe the cluster s complete we defe ts ceter,.e. the pot, whch has the mmum dstace to all other pots the cluster. Fally, the cluster s represeted oly by the ceter pot ad the mea radus. The complete clusterg algorthm s smlar to the sgle cluster costructo. We γ ( )( x µ )

Ttle of the paper Iformatca 3 (999) xxx yyy 5 sequetally select dvdual clusters by ther ablty to maxmze the mea tra-cluster dstace: val{ C, C,.., C } Dst( C,, C j cet j, cet We halt the clusterg process whe the tercluster value falls below a certa threshold. Ths parameter defes whe the clusterg process wll be halted ad how may clusters wll be created. A small threshold results may clusters ad a large oe few clusters. 3.5 Our Algorthm After cosderg all of these clusterg methods ad a umber of etwor pacets collected by recordg a etwor traffc, whch was greater tha 00000, we had to fd a computatoally less demadg algorthm. Frst, we decded to dscard all the pacets whose source IP, destato IP, port umber ad TCP flags combato appeared oly oce the collecto of pacets. After that, we further reduced our collecto by preservg oly oe pacet amog all whch had the same source IP, destato IP, port umber ad TCP flags combato. I ths way, we reduced the umber of pacets to oly about 500 pacets. 3.6 Smlarty Measure Decso about truso based o oly oe pacet s certaly urelable. Therefore, we decded to base the decso whether there s truso or ot by cosderg a sequece of pacets. We cosdered dfferet ds of smlarty fuctos [] to compare the sequeces of pacets. Sce a exact match betwee the volved sequeces s t lely, we examed four varats of loosely matchg smlarty fuctos. Furthermore, we do t requre all header data betwee two pacets (oe from actual sequece ad the other from trag examples sequece) to be the same but at least the source IP, the destato IP, the port ad flags. Frst of the fuctos, deoted as MC-P (Match Cout Polyomal), smply couts the umber of matchg postos betwee the sequeces. ) The ext smlarty fucto s deoted as MC-E (Match Cout Expoetally). Ths fucto doubles ts value for each matchg posto betwee sequeces. The ext two smlarty fuctos are based o the feelg that adjacet matches should have stroger weght. Therefore we explored the MCA-P (Match Cout Adjacecy Polyomal) ad the MCA-E (Match Cout Adjacecy Expoetal) fucto. Smlarty measure computato s the same all four cases (oly fuctos are dfferet) - see Algorthm 4 below. Set a adjacecy couter c to oe (c ) ad the tal value of the smlarty measure to, Sm. For each posto j the sequece legth l: If Xj Yj the Sm f(sm,c) ad c u(c) otherwse c. After all postos are examed retur the measure value. Algorthm 4: Smlarty measure computato We have a sequece of l actual pacets X ( x, x,.., xl ) ad sequece of l trag examples Y ( y, y,.., yl ). Fally, there s table wth f(sm,c) ad u(c) deftos for all four types of smlarty measure see Table below. f(sm,c) u(c) MC-P 0 Sm + MCA-P 0 Sm + c c+ MC-E * Sm MCA-E 0 Sm + c *c Table : Fuctos for dfferet smlarty measure computatos It was foud that statstcal sgfcace of smlarty fuctos s dstgushable, so we used MC-P. 3.7 Other parts of VAL We combed our VAL statstcal detector wth GNU lcesed rule-based lghtweght truso detector SNORT. It s used ot oly for rule-based detecto but t serves also for

Ttle of the paper Iformatca 3 (999) xxx yyy 6 TCP etwor traffc capture. Traffc s stored to MySQL database. From there s accessed by the statstcal detector wrtte GNU C. The orgal database schema defed wth SNORT s adjusted ad exteded. I ths way, we produced a hybrd truso detecto system. Furthermore, we corporated Lux ptables persoal frewall to bloc hostle actvtes detected ether wth SNORT or wth the statstcal detector. Addtoally, e-mal s set to the securty admstrator f truso s detected. We also bult a web maagemet cosole wrtte PHP wth access to the same MySQL database for admstratve ad formatve purposes. 4 Results To test the statstcal detector, we used Nessus Vulerablty Scaer ad geerated the attacs ourselves. We executed a whole rage of attacs ad obtaed the followg results: Total umber of pacets was 49003 Number of captured pacets was 343 or 94.060% Number of ot captured pacets was 4790 or 5.940% Number of detected trusve pacets was 703 or 95.73% (amog captured) Number of udetected trusve pacets was 077 or 4.77% (amog captured) The results are relatvely satsfyg. However, at a greater regular traffc load, the result would probably deterorate. Also, f the attacer goes slow ad low, most lely othg would be detected. However, o statstcal detector performs better smlar codtos. 5 Cocluso Securty threats to our computer systems ca be reduced, wth the help of a truso detecto system. A statstcal detector performs truso detecto by comparg a curret actvty wth a owledge base of regular actvty. Our VAL statstcal detector, uses Istace-based Learg wth the - Nearest Neghbor method. To mprove the qualty of gathered data ad to reduce ts quatty, we examed varous clusterg algorthms ad fally used our ow. The we spected fuctos for smlarty measure computato. Sce t has bee foud that there s o sgfcat dfferece ther qualty we used MC-P. We completed our soluto wth SNORT, the frewall ad the maagemet cosole. The testg has show that our detector, combed wth other compoets, secures computer coected to Iteret qute well despte ts smple costructo. Acowledgemet Authors are thaful to Mha Strehar for sharg hs experece about truso detecto, hs help at etwor traffc acqusto ad soluto testg. Refereces [] Terra Lae, Mache Learg Techques for the Computer Securty: Doma of Aomaly Detecto, A Thess Submtted to the Faculty of Purdue Uversty, 000 [] D. Aha, D. bler, M. Albert: Istace- Based Learg Algorthms, Mache Learg, 99 [3] B. V. Dasarathy: Nearest Neghbor (NN) Norms: NN Patter Classfcato Techques, IEEE Computer Socety Press, 99 [4] D.R. Wlso, T.R. Martez: Reducto Techques for Exemplar-Based Learg Algorthms, Mache Learg, 000 [5] T.. Moo: The Expectato- Maxmzato Algorthm, IEEE Sgal Processg Magaz, 996 [6] Stephe Northcutt: Networ Itruso Detecto: A Aalyst's Hadboo, New Rders, 999