Feature selecton for ntruson detecton Slobodan Petrovć NISlab, Gjøvk Unversty College
Contents The feature selecton problem Intruson detecton Traffc features relevant for IDS The CFS measure The mrmr measure Solvng the optmzaton problems 2/50
The feature selecton problem The curse of dmensonalty 2 close ponts n a 2 dmensonal space are probably dstant n a 100 dmensonal space Any machne learnng algorthm Makes a predcton of unseen data ponts by a hypothess constructed from a lmted number of tranng nstances In hgh dmensonal space dffcult 3/50
The feature selecton problem Hypothess (n ths settng) A pattern or functon that predcts classes based on gven data Hypothess space Contans all the hypotheses that can be learned from data 4/50
The feature selecton problem A lnear ncrease n the number of features (.e. the dmenson of the feature space) leads to the exponental ncrease of the hypothess space Example: 2 classes, N bnary features The cardnalty of the hypothess space: N 2 2 5/50
The feature selecton problem Feature selecton Removes rrelevant features Removes redundant features Consequence Effcent reducton of the hypothess space Easer to fnd the correct hypothess Reduced number of requred tranng nstances (the reducton s exponental) 6/50
The feature selecton problem Removng rrelevant features Does not affect learnng performance Removng redundant features Redundant features a type of rrelevant features The dfference: a redundant feature requres co presence of another feature Each ndvdual feature s relevant, but removal of one of them wll not affect learnng performance 7/50
The feature selecton problem 2 types of feature selecton methods Feature rankng Rank features accordng to some crteron and select the top k features A threshold s needed n advance to select the top k features Feature subset selecton Selects the mnmum subset of features that does not deterorate learnng performance No threshold necessary 8/50
The feature selecton problem Models of feature selecton The flter model Consders statstcal propertes of a data set drectly No learnng algorthm nvolved Effcent The wrapper model Performance of a gven learnng algorthm s used to determne the qualty of selected features 9/50
Intruson detecton Intruson Actvtes amed at volatng securty (.e. confdentalty, ntegrty and avalablty of computer and network resources) Intruson detecton Process of detecton and dentfcaton of attacks Intruson preventon Process of attack detecton and defense management 10/50
Intruson detecton Intruson detecton system IDS A system that automatcally detects attacks aganst hosts and networks Intruson preventon system IPS A system, whose ambton s to detect attacks and manage defence actvtes An IPS contans an IDS IPS combne IDS wth other preventve measures (frewall, ant vrus, vulnerablty scannng, etc.) 11/50
Intruson detecton IDS classfcaton Accordng to the protected object Host based IDS Network based IDS Accordng to the detecton model Msuse detecton IDS Anomaly detecton IDS 12/50
Intruson detecton Host based IDS Collect data from nternal sources, usually at the operatng system level (varous logs) Montor user actvtes Montor executon of system programs 13/50
Intruson detecton Network based IDS Collect packets, usually by means of network nterfaces n so called promscuous mode (such a devce collects all the packets that reach the nterface, not only those addressed to the host) Perform analyss of the collected packets Montor network actvty 14/50
Intruson detecton Msuse detecton systems Collect nformaton about attack ndcators and then determne whether those ndcators are present n ncomng data Attack ndcators (sgnatures) Analyss (e.g. pattern matchng) Attack Actvtes 15/50
Intruson detecton Anomaly detecton systems Defne profles of normal behavour of users or networks, compare actual behavour wth those profles and generate alerts f the dscrepancy from the profles s too hgh Profles of normal behavour Analyss Attack Actvtes 16/50
Intruson detecton Incomng traffc/logs Data pre-processor Actvty data Detecton model(s) Detecton algorthm Alerts Decson crtera Alert flter Acton/report 17/50
Intruson detecton What propertes should the data preprocessor possess? Whch detecton model s optmal? What s the best detecton algorthm? What are the optmal decson crtera? What alert flter gves the best results? 18/50
Intruson detecton Untl recently, the answers to those questons were heurstc ntruson detecton was a more techncal dscplne, wthout clear theoretcal foundaton After 2005, some theoretcal models of IDS appeared Models based on complexty theory Informaton theoretc models 19/50
Intruson detecton IDS model (1) IDS s an 8 tuple (,Σ,,,,, ) The frst 4 components are data structures Data source The set of data states Σ The set of data unt features Knowledge base about data profles 20/50
Intruson detecton IDS model (2) The second 4 components are algorthms Algorthm for feature selecton Algorthm for reducton and representaton Knowledge base generator Classfcaton algorthm 21/50
Intruson detecton Data source A flow of consecutve data unts (packets, data flow unts, system calls) = (D 1, D 2,...), where D s the analyzed data unt, D {d 1, d 2,...}, d j s a possble data unt In network based IDS, s a stream of packets P=(P 1, P 2,...) In host based IDS, can be a stream of system calls C=(C 1, C 2,...) 22/50
Intruson detecton The set of data states Σ Contans normalty ndcators for each D If D s abnormal, t s possble that the correspondng ndcator from Σ also contans the type of the attack In anomaly detecton, Σ={normal, abnormal} or Σ={N,A} or Σ={0,1} In msuse detecton, Σ={normal, attack type 1, attack type 2,...} or Σ={N,A 1,A 2,...} 23/50
Intruson detecton The set of data unt features A vector of features that contans a fnte number of attrbutes of a data unt, F=(f 1,f 2,...,f n ) Examples: protocol name, port number, etc. Every feature has ts doman R A set of dscrete or contnuous values 24/50
Intruson detecton Knowledge base about data profles Contans profles of normal and abnormal data unts Internal structure of the base s dfferent for each IDS (a tree, a Markov model, a Petr net, a set of rules, a base of attack sgnatures, etc.) In msuse based systems, s a set of rules that descrbe attack profles (.e. attack sgnatures) In anomaly detecton systems, s a profle of normal traffc 25/50
Intruson detecton An deal data unt tester Oracle IDS Performs analyss of each data unt D Gves the ndcator value at the output Normal Abnormal Always gves the correct value of the ndcator For each D, ts state s Oracle IDS (D ) 26/50
Intruson detecton Algorthm for feature selecton Gven and the correspondng states from Σ, the algorthm gves certan number of features that IDS wll measure and decde on them In general, depends very much on the knowledge about the attack characterstcs The qualty of the selected features manly determnes the effectveness of the IDS 27/50
Intruson detecton Feature selecton 28/50
Intruson detecton Algorthm for reducton and representaton Durng data processng, IDS frst performs data reducton,.e. extracton of characterstcs that are the results of the executon of the algorthm, and then ther representaton n the form of a vector wth coordnates n Thus, : 29/50
Intruson detecton Knowledge base generator To generate the knowledge base, we need an algorthm that, based on the vectoral data representatons and ther states, generates the knowledge base 30/50
Intruson detecton Knowledge base generaton 31/50
Intruson detecton Classfcaton algorthm That s a functon that maps the representaton of the gven data unt nto some state, based on the knowledge base Formally, : Σ 32/50
Intruson detecton Detecton procedure (classfcaton) 33/50
Intruson detecton Phases of operaton of an IDS (1) Feature selecton In general, ths phase s executed only once, durng the development of the IDS Knowledge base generaton Sometmes called the tranng procedure The algorthm (wth the help of the algorthm ) s executed over a large quantty of tranng data In general executed once, but the base may occasonally be updated 34/50
Intruson detecton Phases of operaton of an IDS (2) Detecton procedure IDS s appled over real data n order to detect attacks The most mportant and most often used phase 35/50
Traffc features relevant for IDS The goal of the feature selecton algorthm n an IDS To determne the most relevant features of the ncomng traffc, whose montorng ensures relable detecton of abnormal behavor Effectveness of the classfcaton heavly depends on the number of features It s necessary to mnmze that number, wthout droppng ndcators of abnormal behavor 36/50
Traffc features relevant for IDS In the contemporary IDS The most of work on feature selecton s stll done manually The feature selecton depends too much on expert knowledge unrelable Better algorthms for automatc feature selecton n IDS are needed 37/50
Traffc features relevant for IDS For IDS Due to hgh dmensonal data, the flter model s more approprate for automatc feature selecton To elmnate redundant features, the featuresubset evaluatng method seems to be better than the feature rankng method A generc feature selecton measure s defned frst and then the methods to maxmze t are found 38/50
Traffc features relevant for IDS The generc feature selecton measure (*) = 1 ( X) ( X) x =1 ndcates appearance of the feature f a 0 and b 0 are constants A (X) and B (X) are lnear functons The feature selecton problem n a0 + A x = 1 GeFS( X) =, X = n 1 K n 1 b + B x 0 ( x,,x ) { 0, } n Fnd X { 0,1 } n that maxmzes GeFS(X) 39/50
Traffc features relevant for IDS Several feature selecton measures representable n the form (*) The Correlaton Feature Selecton (CFS) measure The mnmal Redundancy Maxmal Relevance (mrmr) measure Etc. 40/50
The CFS measure The mert functon of a feature subset S consstng of k features Mert S ( k) = k + k kr fc ( k 1) r ff where r fc s the average value of all featureclassfcaton correlatons and r ff s the average value of all feature feature correlatons 41/50
The CFS measure The mert functon reflects the followng ntutve hypothess about qualty of a feature subset Good feature subsets contan features hghly correlated wth the classfcaton, yet uncorrelated to each other The mert functon s maxmzed n the CFS measure max S { ( k), 1 k n} Mert S 42/50
The CFS measure It can be shown that the problem of maxmzaton of the mert functon can be presented as an nstance of the GeFS measure (GeFS CFS ) ( ) + = = n j j j n x x b x x a 1 2 1 2 max X NISlab, Gjøvk Unversty College 09.12.2010 43/50
The mrmr measure Based on mutual nformaton The relevance of features and the redundancy between features are consdered smultaneously 44/50
The mrmr measure The relevance of a feature set S for the class c 1 S ( ) = I( f,c) D S,c f S The redundancy between features n S ( S ) = I( f, f ) S 1 R 2 f, f j S j 45/50
The mrmr measure Combng the relevance and redundancy measures, we get the mrmr measure, whch s to be maxmzed max S 1 S f S I ( f,c) I ( f, f ) S 1 2 f, f j S j 46/50
The mrmr measure It can be shown that the problem of maxmzaton of the mrmr measure can also be presented as an nstance of the GeFS measure (GeFS mrmr ) ( ) = = = = 2 1 1 1 1 max n n j, j j n n x x x a x x c X NISlab, Gjøvk Unversty College 09.12.2010 47/50
Solvng the optmzaton problems The problems of maxmzng GeFS CFS and GeFS mrmr can be solved f we analyze them as problems of fractonal programmng In partcular, these problems pertan to the category of Polynomal Mxed 0 1 Fractonal Programmng problems (PM01FP) 48/50
Solvng the optmzaton problems The general form of PM01FP n m a + a j= j x 1 k J k mn = n 1 b + b j j x 1 k = k J under the followng constrants n b + b x,,,m j j k J k > 0 = 1 K = 1 n c p + c x, p,,m j pj k 0 = 1 K 1 x k = k J { 01, },k J a,b,c p,a j,b j, c pj R 49/50
Solvng the optmzaton problems By ntroducng approprate substtutons, such a PM01FP can be transformed nto a Mxed 0 1 Lnear Programmng Problem (M01LP) M01LP can be solved by means of the branch andbound method A globally optmal soluton s obtaned The number of varables and constrants n the M01LP s lnear n the number n of full set features 50/50