Data Mining for Network Intrusion Detection



Similar documents
Detection. Perspective. Network Anomaly. Bhattacharyya. Jugal. A Machine Learning »C) Dhruba Kumar. Kumar KaKta. CRC Press J Taylor & Francis Croup

How To Prevent Network Attacks

Intrusion Detection. Jeffrey J.P. Tsai. Imperial College Press. A Machine Learning Approach. Zhenwei Yu. University of Illinois, Chicago, USA

Intrusion Detection for Mobile Ad Hoc Networks

Efficient Security Alert Management System

Hybrid Model For Intrusion Detection System Chapke Prajkta P., Raut A. B.

Intrusion Detection Systems. Overview. Evolution of IDSs. Oussama El-Rawas. History and Concepts of IDSs

An Anomaly-Based Method for DDoS Attacks Detection using RBF Neural Networks

CSCE 465 Computer & Network Security

A Review of Anomaly Detection Techniques in Network Intrusion Detection System

Hybrid Intrusion Detection System Model using Clustering, Classification and Decision Table

A Review on Network Intrusion Detection System Using Open Source Snort

CHAPTER 1 INTRODUCTION

Development of a Network Intrusion Detection System

IDS / IPS. James E. Thiel S.W.A.T.

Our Security. History of IDS Cont d In 1983, Dr. Dorothy Denning and SRI International began working on a government project.

SURVEY OF INTRUSION DETECTION SYSTEM

INTRUSION DETECTION SYSTEMS and Network Security

A survey on Data Mining based Intrusion Detection Systems

Network Intrusion Detection Systems

Layered Approach of Intrusion Detection System with Efficient Alert Aggregation for Heterogeneous Networks

MODERN INTRUSION DETECTION, DATA MINING, AND DEGREES OF ATTACK GUILT

The Integration of SNORT with K-Means Clustering Algorithm to Detect New Attack

A Survey of Intrusion Detection System Using Different Data Mining Techniques

KEITH LEHNERT AND ERIC FRIEDRICH

NETWORK INTRUSION DETECTION SYSTEM USING HYBRID CLASSIFICATION MODEL

International Journal of Computer Science Trends and Technology (IJCST) Volume 3 Issue 3, May-June 2015

A Survey on Intrusion Detection System with Data Mining Techniques

A new Approach for Intrusion Detection in Computer Networks Using Data Mining Technique

International Journal of Innovative Research in Advanced Engineering (IJIRAE) ISSN: Volume 1 Issue 11 (November 2014)

Intrusion Detection Systems and Supporting Tools. Ian Welch NWEN 405 Week 12

Secure Because Math: Understanding ML- based Security Products (#SecureBecauseMath)

Data Mining for Customer Service Support. Senioritis Seminar Presentation Megan Boice Jay Carter Nick Linke KC Tobin

Identifying At-Risk Students Using Machine Learning Techniques: A Case Study with IS 100

CYBER SCIENCE 2015 AN ANALYSIS OF NETWORK TRAFFIC CLASSIFICATION FOR BOTNET DETECTION

Random forest algorithm in big data environment

An Efficient Way of Denial of Service Attack Detection Based on Triangle Map Generation

STANDARDISATION AND CLASSIFICATION OF ALERTS GENERATED BY INTRUSION DETECTION SYSTEMS

Intrusion Detection System for Cloud Network Using FC-ANN Algorithm

ANALYTICS IN BIG DATA ERA

TIME SCHEDULE. 1 Introduction to Computer Security & Cryptography 13

An Introduction to Data Mining

IDS Categories. Sensor Types Host-based (HIDS) sensors collect data from hosts for

False Positives Reduction Techniques in Intrusion Detection Systems-A Review

Data Mining For Intrusion Detection Systems. Monique Wooten. Professor Robila

Intrusion Detection and Intrusion Prevention. Ed Sale VP of Security Pivot Group, LLC

Prerequisites. Course Outline

A Study Of Bagging And Boosting Approaches To Develop Meta-Classifier

Keywords - Intrusion Detection System, Intrusion Prevention System, Artificial Neural Network, Multi Layer Perceptron, SYN_FLOOD, PING_FLOOD, JPCap

IDS IN TELECOMMUNICATION NETWORK USING PCA

A Practical Approach to Anomaly based Intrusion Detection System by Outlier Mining in Network Traffic

A Secured Approach to Credit Card Fraud Detection Using Hidden Markov Model

Intrusion Detection Categories (note supplied by Steve Tonkovich of CAPTUS NETWORKS)

HYBRID INTRUSION DETECTION FOR CLUSTER BASED WIRELESS SENSOR NETWORK

Network Security A Decision and Game-Theoretic Approach

Mahalanobis Distance Map Approach for Anomaly Detection

Intrusion Detection in AlienVault

Network Intrusion Detection Using a HNB Binary Classifier

System for Denial-of-Service Attack Detection Based On Triangle Area Generation

Observation and Findings

A SYSTEM FOR DENIAL OF SERVICE ATTACK DETECTION BASED ON MULTIVARIATE CORRELATION ANALYSIS

A Technical Review on Intrusion Detection System

Second-generation (GenII) honeypots

An analysis of suitable parameters for efficiently applying K-means clustering to large TCPdump data set using Hadoop framework

Passive Logging. Intrusion Detection System (IDS): Software that automates this process

Role of Anomaly IDS in Network

Application of Data Mining to Network Intrusion Detection: Classifier Selection Model

VHDL Modeling of Intrusion Detection & Prevention System (IDPS) A Neural Network Approach

How To Detect Denial Of Service Attack On A Network With A Network Traffic Characterization Scheme

Intrusion Detection via Machine Learning for SCADA System Protection

Credit Card Fraud Detection Using Hidden Markov Model

Conclusions and Future Directions

How to Detect and Prevent Cyber Attacks

Network packet payload analysis for intrusion detection

Social Media Mining. Data Mining Essentials

Is a Data Scientist the New Quant? Stuart Kozola MathWorks

A Dynamic Flooding Attack Detection System Based on Different Classification Techniques and Using SNMP MIB Data

A Review on Hybrid Intrusion Detection System using TAN & SVM

CS Computer and Network Security: Intrusion Detection

Welcome. Data Mining: Updates in Technologies. Xindong Wu. Colorado School of Mines Golden, Colorado 80401, USA

Intrusion Detection: Game Theory, Stochastic Processes and Data Mining

Course Content Summary ITN 261 Network Attacks, Computer Crime and Hacking (4 Credits)

Detection of Distributed Denial of Service Attacks Using Statistical Pre-Processor and Unsupervised Neural Networks

Network- vs. Host-based Intrusion Detection

Impact of Feature Selection on the Performance of Wireless Intrusion Detection Systems

HMM Profiles for Network Traffic Classification

Data Mining. Nonlinear Classification

How to use Big Data in Industry 4.0 implementations. LAURI ILISON, PhD Head of Big Data and Machine Learning

Application of Netflow logs in Analysis and Detection of DDoS Attacks

An Overview of Knowledge Discovery Database and Data mining Techniques

FUZZY DATA MINING AND GENETIC ALGORITHMS APPLIED TO INTRUSION DETECTION

Transcription:

Data Mining for Network Intrusion Detection S Terry Brugger UC Davis Department of Computer Science Data Mining for Network Intrusion Detection p.1/55

Overview This is important for defense in depth Much work has been done in the area, but no solution yet I will investigate an ensemble approach as a possible solution Data Mining for Network Intrusion Detection p.2/55

We need to detect intrusions Can t stop intrusions, so need to mitigate them Can mitigate (stop the attackers) when they re detected, or take other corrective action (improving defenses) Part of defense in depth Data Mining for Network Intrusion Detection p.3/55

Current IDSs are not sufficient Only detect known attacks Can t detect insider attacks (privilege abuse) Don t have a holistic picture of the network to detect multi-step attacks over a long time period Data for detection is available, but sysadmin resources are limited Data Mining for Network Intrusion Detection p.4/55

The solution is Data Mining Data Mining: The process of extracting useful and previously unnoticed models or patterns from large data stores. (Also called sensemaking.) Data Mining for Network Intrusion Detection p.5/55

Data mining should be done in an Offline environment Last line of defense used in concert with real-time systems Allows system to be queried post hoc More complete session information Data mining techniques are expensive (even with mitigation through cost-based models [Lee]) Data Mining for Network Intrusion Detection p.6/55

More reasons to work in an Offline environment Periodic batch processing provides trade-off between timeliness and efficiency Allows for holistic picture, grouping related activity Harder to attack IDS via denial of service Data Mining for Network Intrusion Detection p.7/55

We mine network connection records Readily available Efficient (good size to information ratio) Easy for data mining methods to operate on Avoids privacy issues and encryption of data streams Data Mining for Network Intrusion Detection p.8/55

How network connection records break down Intrinsic attributes Essential attributes Axis and reference attributes Secondary attributes Calculated attributes Data Mining for Network Intrusion Detection p.9/55

Place-holder for intrinsic attributes table Data Mining for Network Intrusion Detection p.10/55

Place-holder for calculated attributes table Data Mining for Network Intrusion Detection p.11/55

Additional useful information Calendar schema Normalization (pseudo-bayes estimators, probability given other values) Compression (UDP, ICMP, source net, information gain) Selection using genetic algorithm [Helmer] Data Mining for Network Intrusion Detection p.12/55

Many datasets, none great Information Exploration Shootout (IES) Internet Traffic Archive [LBL] Security Suite 16 [InfoWorld] DARPA Off-line Intrusion Detection Evaluation 1998 KDD-Cup 1999 Data Mining for Network Intrusion Detection p.13/55

What s so bad with the IDEval? McHugh identified numerous procedural problems Unrealistic data rates Failure to show relation to real traffic Data Mining for Network Intrusion Detection p.14/55

More problems with IDEval? Mahoney & Chan found problems with the data Some fields like TTL predictable Allowed naive methods to achieve high detection rates Correctable by mixing with real traffic Despite all this, DARPA dataset still the standard Data Mining for Network Intrusion Detection p.15/55

Some desired features in existing systems Information Security Officer s Assistant (ISOA) [Winkler] and Distributed Intrusion Detection System (DIDS) [Snapp] did data fusion and multi-sensor correlation SRI work: IDES, NIDES, EMERALD provide more published research in this area Data Mining for Network Intrusion Detection p.16/55

Commercial offerings to correlate alarms RealSecure SiteProtector Symantec ManHunt nsecure npatrol Cisco IDS Network Flight Recorder (NFR) Data Mining for Network Intrusion Detection p.17/55

Commercial offerings for audit trail integrity Computer Associates etrust Intrusion Detection Log View NetSecure Log Data Mining for Network Intrusion Detection p.18/55

Some functionality offered by services SANS Internet Storm Center dshield (Independent Storm Center Analysis and Coordination Center) mynetwatchman Security Focus DeepSight Analyzer Managed service available from Counterpane, ISS, and Symantec Data Mining for Network Intrusion Detection p.19/55

Internet Storm Center Data Mining for Network Intrusion Detection p.20/55

Two major data mining approaches Statistical (top-down) Machine learning (bottom-up) Data Mining for Network Intrusion Detection p.21/55

Statistical techniques Probability of record given correlated probability of individual fields [SRI] Probability of record given Bayes network of conditional probabilities [Staniford] Probability of value not seen in training given alphabet size and time since last anomaly [Mahoney] Decision trees (ID3) [Sinclair] Data Mining for Network Intrusion Detection p.22/55

Many types of machine learning Classification Clustering Support Vector Machines [Eskin,Mukkamala] Others Data Mining for Network Intrusion Detection p.23/55

Classification approaches Inductive rule [Lee,Helmer,Warrender] Genetic algorithms [Neri,Sinclair,Dasgupta,Crosbie,Chittur] Fuzzy rules [Dickerson,Luo] Neural nets [Giacinto,Ghosh,Ryan,Endler] Immunological [Hofmeyr,Dasgupta,Fan] Data Mining for Network Intrusion Detection p.24/55

Clustering approaches Fixed width, k-nearest neighbor [Portnoy,Eskin,Chan] k-means [Bloedorn] Learning Vector Quantization [Marin] Simulated annealing [Staniford] Data Mining for Network Intrusion Detection p.25/55

More Clustering approaches Approximate Distance Clustering & AKMDE [Marchette] Dynamic Clustering [Sequeira] Parzen-window [Yeung] Instance-based learner [Lane] Data Mining for Network Intrusion Detection p.26/55

Other approaches Colored Petri nets [Kumar] Graphs [Staniford,Tolle] Markov models [Lane,Warrender] Data Mining for Network Intrusion Detection p.27/55

Other proposed methods Proposed by [Denning] [Denning] [Denning] [Denning] [Kumar] [Denning] [Frank,Endler] Method operational model mean and standard deviation multivariate model Markov process model generalized Markov chain time series model Recurrent neural network Data Mining for Network Intrusion Detection p.28/55

More proposed methods [Chan,Prodromidis] [Bass] [Bass] [Lane] [Lane] [Lane] [Lane] [Lane] C4.5, ID3, CART, WPEBLS Dempster-Shafer method Generalized EPT Spectral analysis Principle component analysis Linear regression Linear predictive coding (γ, ǫ)-similarity Data Mining for Network Intrusion Detection p.29/55

Research to date has provided progress, no solution Most data mining methods for ID are good at detecting particular types of malicious activity False positive rates are high (base-rate fallacy [Axelsson]) Data Mining for Network Intrusion Detection p.30/55

Better performance through Ensemble techniques (Also called meta-learning or multi-strategy learning) It is well known in the machine learning literature that appropriate combination of a number of weak classifiers can yield a highly accurate global classifier. [Lane] Data Mining for Network Intrusion Detection p.31/55

More support for Ensemble techniques Neri notes that combining classifiers learned by different learning methods, such as hill-climbing and genetic evolution, can produce higher classification performances because of the different knowledge captured by complementary search methods. Data Mining for Network Intrusion Detection p.32/55

Ensemble techniques important for ID In reality there are many different types of intrusions, and different detectors are needed to detect them. [Axelsson] Combining evidence from multiple base classifiers... is likely to improve the effectiveness in detecting intrusions. [Lee] Data Mining for Network Intrusion Detection p.33/55

Some work has been done with Ensemble techniques Manually built covariance matrix in [N]IDES to use multiple classifiers Crosbie s autonomous agents and Staniford s SPICE also do basic correlation of statistical classifiers Lee, Fan, et al. proposed use for incorporating classifiers trained on new data and aging out old classifiers Data Mining for Network Intrusion Detection p.34/55

More prior work with Ensemble techniques Lee, Fan, et al. also used cost-based meta-classifiers ADAM uses multiple classifiers for filtering [Barbará] Giacinto et al. use ensemble of neural nets trained on different feature sets Data Mining for Network Intrusion Detection p.35/55

Outstanding questions 1. For baseline purposes, what is the accuracy of a contemporary NID on the DARPA dataset? 2. Ideal number of states for a Hidden Markov Model, and what parameters influence this value? 3. Ideal feature sets for different data mining techniques? 4. Should connectionless protocols like UDP and ICMP, be compressed to a single connection (as in TCP)? 5. Separate training sets for classifiers and meta-classifiers? Data Mining for Network Intrusion Detection p.36/55

More Outstanding questions 6. What is the accuracy of ensemble based offline NID employing numerous, different, techniques? 7. How much data is required in order to properly train a data-mining based IDS? 8. How dependent is data mining performance on training on same network as it s used? 9. Should hosts and / or services be grouped together for usage profiles? Data Mining for Network Intrusion Detection p.37/55

More Outstanding questions 10. Other forms of data compression to improve accuracy? 11. Predictive capabilities of an offline network intrusion detection system? 12. How much will the incorporation additional data sources improve performance? 13. Better accuracy by considering state of hosts with connection as transition operator? 14. Does the ideal time window, w, depend on the current state of a host? Data Mining for Network Intrusion Detection p.38/55

Big questions 15. What similarities or differences exist in the traffic characteristics between different types of networks that impact the performance characteristics of a network intrusion detector? 16. What is the user acceptability level of false alarms? 17. How much can false alarms be reduced through the use of user feedback, and learning algorithms or classifier retraining? Data Mining for Network Intrusion Detection p.39/55

What am I going to do about it? Data Mining for Network Intrusion Detection p.40/55

Answer the first six 1. For baseline purposes, what is the accuracy of a contemporary NID on the DARPA dataset? 2. Ideal number of states for a Hidden Markov Model, and what parameters influence this value? 3. Ideal feature sets for different data mining techniques? Data Mining for Network Intrusion Detection p.41/55

Goal 4. Should connectionless protocols like UDP and ICMP, be compressed to a single connection (as in TCP)? 5. Separate training sets for classifiers and meta-classifiers? 6. What is the accuracy of ensemble based offline NID employing numerous, different, techniques? Data Mining for Network Intrusion Detection p.42/55

Approach Datasets 1998, 1999 DARPA TCP data 1998 DARPA mixed with real data Baseline Snort Connection Mining (tcpreduce) Data Mining for Network Intrusion Detection p.43/55

Place-holder for Connection Table Creation Data Mining for Network Intrusion Detection p.44/55

Place-holder for Results Table Creation Data Mining for Network Intrusion Detection p.45/55

Base method creation Training mode Classification mode Data Mining for Network Intrusion Detection p.46/55

Anomaly detection methods Bayes network Non-self bit-vectors Hidden Markov Model Data Mining for Network Intrusion Detection p.47/55

Classification methods Decision tree Associative rules Neural network Elman network Genetic algorithm Clustering algorithm Support Vector Machine Data Mining for Network Intrusion Detection p.48/55

Further approach Ideal parameter determination Ideal feature set determination Base classifier analysis Training information population Data Mining for Network Intrusion Detection p.49/55

About Meta-classifiers 1. Total anomaly from individual 2. Total probe from individual 3. Total DoS from individual 4. Total R2L from individual 5. Total U2L from individual 6. Total threat from individual 7. Total threat from totals Data Mining for Network Intrusion Detection p.50/55

Meta-classifiers Naive-Bayes Decision tree Associative rules Neural network Genetic algorithm Support Vector Machine Data Mining for Network Intrusion Detection p.51/55

Final approach Ideal meta-classifier training Ideal performance testing Analysis and writeup Data Mining for Network Intrusion Detection p.52/55

Timeline Step Finish date Mixed dataset generated 7 October 2004 Baseline 1 November 2004 Connection mining 15 October 2004 Table creation 15 August 2004 Base classifiers 1 April 2005 HMM parameter estimation 15 April 2005 Ideal feature set determination 1 July 2005 Data Mining for Network Intrusion Detection p.53/55

A little more time Base classifier analysis 15 August 2005 Training information population 1 September 2005 Meta-classifiers 22 October 2005 Ideal meta-classifier training 15 December 2005 Ideal performance testing 1 February 2006 Data Mining for Network Intrusion Detection p.54/55

Completion of dissertation 1 May 2006 Data Mining for Network Intrusion Detection p.55/55