Feature selection for intrusion detection. Slobodan Petrović NISlab, Gjøvik University College

Similar documents
The Greedy Method. Introduction. 0/1 Knapsack Problem

The Development of Web Log Mining Based on Improve-K-Means Clustering Analysis

Support Vector Machines

CS 2750 Machine Learning. Lecture 3. Density estimation. CS 2750 Machine Learning. Announcements

v a 1 b 1 i, a 2 b 2 i,..., a n b n i.

Face Verification Problem. Face Recognition Problem. Application: Access Control. Biometric Authentication. Face Verification (1:1 matching)

1 Example 1: Axis-aligned rectangles

A hybrid global optimization algorithm based on parallel chaos optimization and outlook algorithm

Stochastic Protocol Modeling for Anomaly Based Network Intrusion Detection

Study on Model of Risks Assessment of Standard Operation in Rural Power Network

RELIABILITY, RISK AND AVAILABILITY ANLYSIS OF A CONTAINER GANTRY CRANE ABSTRACT

Project Networks With Mixed-Time Constraints

Can Auto Liability Insurance Purchases Signal Risk Attitude?

Performance Analysis and Coding Strategy of ECOC SVMs

Descriptive Models. Cluster Analysis. Example. General Applications of Clustering. Examples of Clustering Applications

An MILP model for planning of batch plants operating in a campaign-mode

benefit is 2, paid if the policyholder dies within the year, and probability of death within the year is ).

Efficient Project Portfolio as a tool for Enterprise Risk Management

ANALYZING THE RELATIONSHIPS BETWEEN QUALITY, TIME, AND COST IN PROJECT MANAGEMENT DECISION MAKING

Logistic Regression. Lecture 4: More classifiers and classes. Logistic regression. Adaboost. Optimization. Multiple class classification

2008/8. An integrated model for warehouse and inventory planning. Géraldine Strack and Yves Pochet

Automated Network Performance Management and Monitoring via One-class Support Vector Machine

On the Optimal Control of a Cascade of Hydro-Electric Power Stations

Mining Feature Importance: Applying Evolutionary Algorithms within a Web-based Educational System

Power-of-Two Policies for Single- Warehouse Multi-Retailer Inventory Systems with Order Frequency Discounts

Gender Classification for Real-Time Audience Analysis System

8.5 UNITARY AND HERMITIAN MATRICES. The conjugate transpose of a complex matrix A, denoted by A*, is given by

POLYSA: A Polynomial Algorithm for Non-binary Constraint Satisfaction Problems with and

The OC Curve of Attribute Acceptance Plans

Single and multiple stage classifiers implementing logistic discrimination

Solving Factored MDPs with Continuous and Discrete Variables

Searching for Interacting Features for Spam Filtering

Forecasting the Direction and Strength of Stock Market Movement

Dynamic Scheduling of Emergency Department Resources

Conversion between the vector and raster data structures using Fuzzy Geographical Entities

Statistical Methods to Develop Rating Models

Module 2 LOSSLESS IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

J. Parallel Distrib. Comput.

What is Candidate Sampling

CHAPTER 5 RELATIONSHIPS BETWEEN QUANTITATIVE VARIABLES

L10: Linear discriminants analysis

Production. 2. Y is closed A set is closed if it contains its boundary. We need this for the solution existence in the profit maximization problem.

Course outline. Financial Time Series Analysis. Overview. Data analysis. Predictive signal. Trading strategy

Financial market forecasting using a two-step kernel learning method for the support vector regression

Institute of Informatics, Faculty of Business and Management, Brno University of Technology,Czech Republic

SVM Tutorial: Classification, Regression, and Ranking

Ring structure of splines on triangulations

Software project management with GAs

Chapter 7: Answers to Questions and Problems

A Hierarchical Anomaly Network Intrusion Detection System using Neural Network Classification

Improved SVM in Cloud Computing Information Mining

Forecasting the Demand of Emergency Supplies: Based on the CBR Theory and BP Neural Network

Methodology to Determine Relationships between Performance Factors in Hadoop Cloud Computing Applications

Sensor placement for leak detection and location in water distribution networks

Negative Selection and Niching by an Artificial Immune System for Network Intrusion Detection

Vision Mouse. Saurabh Sarkar a* University of Cincinnati, Cincinnati, USA ABSTRACT 1. INTRODUCTION

Formulating & Solving Integer Problems Chapter

Availability-Based Path Selection and Network Vulnerability Assessment

Activity Scheduling for Cost-Time Investment Optimization in Project Management

Effective Network Defense Strategies against Malicious Attacks with Various Defense Mechanisms under Quality of Service Constraints

METHODOLOGY TO DETERMINE RELATIONSHIPS BETWEEN PERFORMANCE FACTORS IN HADOOP CLOUD COMPUTING APPLICATIONS

On-Line Fault Detection in Wind Turbine Transmission System using Adaptive Filter and Robust Statistical Features

The Network flow Motoring System based on Particle Swarm Optimized

Joint Scheduling of Processing and Shuffle Phases in MapReduce Systems

AN APPOINTMENT ORDER OUTPATIENT SCHEDULING SYSTEM THAT IMPROVES OUTPATIENT EXPERIENCE

Ants Can Schedule Software Projects

Number of Levels Cumulative Annual operating Income per year construction costs costs ($) ($) ($) 1 600,000 35, , ,200,000 60, ,000

Chapter 4 ECONOMIC DISPATCH AND UNIT COMMITMENT

Extending Probabilistic Dynamic Epistemic Logic

Luby s Alg. for Maximal Independent Sets using Pairwise Independence

A method for a robust optimization of joint product and supply chain design

Heuristic Static Load-Balancing Algorithm Applied to CESM

Preventive Maintenance and Replacement Scheduling: Models and Algorithms

DEFINING %COMPLETE IN MICROSOFT PROJECT

Recurrence. 1 Definitions and main statements

PERFORMANCE COMPARISON OF INTRUSION DETECTION SYSTEM USING VARIOUS TECHNIQUES A REVIEW

MULTIVAC Customer Portal Your access to the MULTIVAC World

A DYNAMIC CRASHING METHOD FOR PROJECT MANAGEMENT USING SIMULATION-BASED OPTIMIZATION. Michael E. Kuhl Radhamés A. Tolentino-Peña

Linear Circuits Analysis. Superposition, Thevenin /Norton Equivalent circuits

Robust Design of Public Storage Warehouses. Yeming (Yale) Gong EMLYON Business School

RESEARCH ON DUAL-SHAKER SINE VIBRATION CONTROL. Yaoqi FENG 1, Hanping QIU 1. China Academy of Space Technology (CAST)

Chapter 6. Classification and Prediction

Intelligent stock trading system by turning point confirming and probabilistic reasoning

Causal, Explanatory Forecasting. Analysis. Regression Analysis. Simple Linear Regression. Which is Independent? Forecasting

Research Article Enhanced Two-Step Method via Relaxed Order of α-satisfactory Degrees for Fuzzy Multiobjective Optimization

Optimal allocation of safety and security resources

行 政 院 國 家 科 學 委 員 會 補 助 專 題 研 究 計 畫 成 果 報 告 期 中 進 度 報 告

THE APPLICATION OF DATA MINING TECHNIQUES AND MULTIPLE CLASSIFIERS TO MARKETING DECISION

Transcription:

Feature selecton for ntruson detecton Slobodan Petrovć NISlab, Gjøvk Unversty College

Contents The feature selecton problem Intruson detecton Traffc features relevant for IDS The CFS measure The mrmr measure Solvng the optmzaton problems 2/50

The feature selecton problem The curse of dmensonalty 2 close ponts n a 2 dmensonal space are probably dstant n a 100 dmensonal space Any machne learnng algorthm Makes a predcton of unseen data ponts by a hypothess constructed from a lmted number of tranng nstances In hgh dmensonal space dffcult 3/50

The feature selecton problem Hypothess (n ths settng) A pattern or functon that predcts classes based on gven data Hypothess space Contans all the hypotheses that can be learned from data 4/50

The feature selecton problem A lnear ncrease n the number of features (.e. the dmenson of the feature space) leads to the exponental ncrease of the hypothess space Example: 2 classes, N bnary features The cardnalty of the hypothess space: N 2 2 5/50

The feature selecton problem Feature selecton Removes rrelevant features Removes redundant features Consequence Effcent reducton of the hypothess space Easer to fnd the correct hypothess Reduced number of requred tranng nstances (the reducton s exponental) 6/50

The feature selecton problem Removng rrelevant features Does not affect learnng performance Removng redundant features Redundant features a type of rrelevant features The dfference: a redundant feature requres co presence of another feature Each ndvdual feature s relevant, but removal of one of them wll not affect learnng performance 7/50

The feature selecton problem 2 types of feature selecton methods Feature rankng Rank features accordng to some crteron and select the top k features A threshold s needed n advance to select the top k features Feature subset selecton Selects the mnmum subset of features that does not deterorate learnng performance No threshold necessary 8/50

The feature selecton problem Models of feature selecton The flter model Consders statstcal propertes of a data set drectly No learnng algorthm nvolved Effcent The wrapper model Performance of a gven learnng algorthm s used to determne the qualty of selected features 9/50

Intruson detecton Intruson Actvtes amed at volatng securty (.e. confdentalty, ntegrty and avalablty of computer and network resources) Intruson detecton Process of detecton and dentfcaton of attacks Intruson preventon Process of attack detecton and defense management 10/50

Intruson detecton Intruson detecton system IDS A system that automatcally detects attacks aganst hosts and networks Intruson preventon system IPS A system, whose ambton s to detect attacks and manage defence actvtes An IPS contans an IDS IPS combne IDS wth other preventve measures (frewall, ant vrus, vulnerablty scannng, etc.) 11/50

Intruson detecton IDS classfcaton Accordng to the protected object Host based IDS Network based IDS Accordng to the detecton model Msuse detecton IDS Anomaly detecton IDS 12/50

Intruson detecton Host based IDS Collect data from nternal sources, usually at the operatng system level (varous logs) Montor user actvtes Montor executon of system programs 13/50

Intruson detecton Network based IDS Collect packets, usually by means of network nterfaces n so called promscuous mode (such a devce collects all the packets that reach the nterface, not only those addressed to the host) Perform analyss of the collected packets Montor network actvty 14/50

Intruson detecton Msuse detecton systems Collect nformaton about attack ndcators and then determne whether those ndcators are present n ncomng data Attack ndcators (sgnatures) Analyss (e.g. pattern matchng) Attack Actvtes 15/50

Intruson detecton Anomaly detecton systems Defne profles of normal behavour of users or networks, compare actual behavour wth those profles and generate alerts f the dscrepancy from the profles s too hgh Profles of normal behavour Analyss Attack Actvtes 16/50

Intruson detecton Incomng traffc/logs Data pre-processor Actvty data Detecton model(s) Detecton algorthm Alerts Decson crtera Alert flter Acton/report 17/50

Intruson detecton What propertes should the data preprocessor possess? Whch detecton model s optmal? What s the best detecton algorthm? What are the optmal decson crtera? What alert flter gves the best results? 18/50

Intruson detecton Untl recently, the answers to those questons were heurstc ntruson detecton was a more techncal dscplne, wthout clear theoretcal foundaton After 2005, some theoretcal models of IDS appeared Models based on complexty theory Informaton theoretc models 19/50

Intruson detecton IDS model (1) IDS s an 8 tuple (,Σ,,,,, ) The frst 4 components are data structures Data source The set of data states Σ The set of data unt features Knowledge base about data profles 20/50

Intruson detecton IDS model (2) The second 4 components are algorthms Algorthm for feature selecton Algorthm for reducton and representaton Knowledge base generator Classfcaton algorthm 21/50

Intruson detecton Data source A flow of consecutve data unts (packets, data flow unts, system calls) = (D 1, D 2,...), where D s the analyzed data unt, D {d 1, d 2,...}, d j s a possble data unt In network based IDS, s a stream of packets P=(P 1, P 2,...) In host based IDS, can be a stream of system calls C=(C 1, C 2,...) 22/50

Intruson detecton The set of data states Σ Contans normalty ndcators for each D If D s abnormal, t s possble that the correspondng ndcator from Σ also contans the type of the attack In anomaly detecton, Σ={normal, abnormal} or Σ={N,A} or Σ={0,1} In msuse detecton, Σ={normal, attack type 1, attack type 2,...} or Σ={N,A 1,A 2,...} 23/50

Intruson detecton The set of data unt features A vector of features that contans a fnte number of attrbutes of a data unt, F=(f 1,f 2,...,f n ) Examples: protocol name, port number, etc. Every feature has ts doman R A set of dscrete or contnuous values 24/50

Intruson detecton Knowledge base about data profles Contans profles of normal and abnormal data unts Internal structure of the base s dfferent for each IDS (a tree, a Markov model, a Petr net, a set of rules, a base of attack sgnatures, etc.) In msuse based systems, s a set of rules that descrbe attack profles (.e. attack sgnatures) In anomaly detecton systems, s a profle of normal traffc 25/50

Intruson detecton An deal data unt tester Oracle IDS Performs analyss of each data unt D Gves the ndcator value at the output Normal Abnormal Always gves the correct value of the ndcator For each D, ts state s Oracle IDS (D ) 26/50

Intruson detecton Algorthm for feature selecton Gven and the correspondng states from Σ, the algorthm gves certan number of features that IDS wll measure and decde on them In general, depends very much on the knowledge about the attack characterstcs The qualty of the selected features manly determnes the effectveness of the IDS 27/50

Intruson detecton Feature selecton 28/50

Intruson detecton Algorthm for reducton and representaton Durng data processng, IDS frst performs data reducton,.e. extracton of characterstcs that are the results of the executon of the algorthm, and then ther representaton n the form of a vector wth coordnates n Thus, : 29/50

Intruson detecton Knowledge base generator To generate the knowledge base, we need an algorthm that, based on the vectoral data representatons and ther states, generates the knowledge base 30/50

Intruson detecton Knowledge base generaton 31/50

Intruson detecton Classfcaton algorthm That s a functon that maps the representaton of the gven data unt nto some state, based on the knowledge base Formally, : Σ 32/50

Intruson detecton Detecton procedure (classfcaton) 33/50

Intruson detecton Phases of operaton of an IDS (1) Feature selecton In general, ths phase s executed only once, durng the development of the IDS Knowledge base generaton Sometmes called the tranng procedure The algorthm (wth the help of the algorthm ) s executed over a large quantty of tranng data In general executed once, but the base may occasonally be updated 34/50

Intruson detecton Phases of operaton of an IDS (2) Detecton procedure IDS s appled over real data n order to detect attacks The most mportant and most often used phase 35/50

Traffc features relevant for IDS The goal of the feature selecton algorthm n an IDS To determne the most relevant features of the ncomng traffc, whose montorng ensures relable detecton of abnormal behavor Effectveness of the classfcaton heavly depends on the number of features It s necessary to mnmze that number, wthout droppng ndcators of abnormal behavor 36/50

Traffc features relevant for IDS In the contemporary IDS The most of work on feature selecton s stll done manually The feature selecton depends too much on expert knowledge unrelable Better algorthms for automatc feature selecton n IDS are needed 37/50

Traffc features relevant for IDS For IDS Due to hgh dmensonal data, the flter model s more approprate for automatc feature selecton To elmnate redundant features, the featuresubset evaluatng method seems to be better than the feature rankng method A generc feature selecton measure s defned frst and then the methods to maxmze t are found 38/50

Traffc features relevant for IDS The generc feature selecton measure (*) = 1 ( X) ( X) x =1 ndcates appearance of the feature f a 0 and b 0 are constants A (X) and B (X) are lnear functons The feature selecton problem n a0 + A x = 1 GeFS( X) =, X = n 1 K n 1 b + B x 0 ( x,,x ) { 0, } n Fnd X { 0,1 } n that maxmzes GeFS(X) 39/50

Traffc features relevant for IDS Several feature selecton measures representable n the form (*) The Correlaton Feature Selecton (CFS) measure The mnmal Redundancy Maxmal Relevance (mrmr) measure Etc. 40/50

The CFS measure The mert functon of a feature subset S consstng of k features Mert S ( k) = k + k kr fc ( k 1) r ff where r fc s the average value of all featureclassfcaton correlatons and r ff s the average value of all feature feature correlatons 41/50

The CFS measure The mert functon reflects the followng ntutve hypothess about qualty of a feature subset Good feature subsets contan features hghly correlated wth the classfcaton, yet uncorrelated to each other The mert functon s maxmzed n the CFS measure max S { ( k), 1 k n} Mert S 42/50

The CFS measure It can be shown that the problem of maxmzaton of the mert functon can be presented as an nstance of the GeFS measure (GeFS CFS ) ( ) + = = n j j j n x x b x x a 1 2 1 2 max X NISlab, Gjøvk Unversty College 09.12.2010 43/50

The mrmr measure Based on mutual nformaton The relevance of features and the redundancy between features are consdered smultaneously 44/50

The mrmr measure The relevance of a feature set S for the class c 1 S ( ) = I( f,c) D S,c f S The redundancy between features n S ( S ) = I( f, f ) S 1 R 2 f, f j S j 45/50

The mrmr measure Combng the relevance and redundancy measures, we get the mrmr measure, whch s to be maxmzed max S 1 S f S I ( f,c) I ( f, f ) S 1 2 f, f j S j 46/50

The mrmr measure It can be shown that the problem of maxmzaton of the mrmr measure can also be presented as an nstance of the GeFS measure (GeFS mrmr ) ( ) = = = = 2 1 1 1 1 max n n j, j j n n x x x a x x c X NISlab, Gjøvk Unversty College 09.12.2010 47/50

Solvng the optmzaton problems The problems of maxmzng GeFS CFS and GeFS mrmr can be solved f we analyze them as problems of fractonal programmng In partcular, these problems pertan to the category of Polynomal Mxed 0 1 Fractonal Programmng problems (PM01FP) 48/50

Solvng the optmzaton problems The general form of PM01FP n m a + a j= j x 1 k J k mn = n 1 b + b j j x 1 k = k J under the followng constrants n b + b x,,,m j j k J k > 0 = 1 K = 1 n c p + c x, p,,m j pj k 0 = 1 K 1 x k = k J { 01, },k J a,b,c p,a j,b j, c pj R 49/50

Solvng the optmzaton problems By ntroducng approprate substtutons, such a PM01FP can be transformed nto a Mxed 0 1 Lnear Programmng Problem (M01LP) M01LP can be solved by means of the branch andbound method A globally optmal soluton s obtaned The number of varables and constrants n the M01LP s lnear n the number n of full set features 50/50