Data Clustering for Anomaly Detection in Network Intrusion Detection



Similar documents
HYBRID INTRUSION DETECTION FOR CLUSTER BASED WIRELESS SENSOR NETWORK

A Comparative Study of Anomaly Detection Schemes in Network Intrusion Detection

CLUSTERING-BASED NETWORK INTRUSION DETECTION

Exploiting the Data Mining Methodology for Cyber Security

Adaptive Anomaly Detection for Network Security

Parallel and Distributed Computing for Cybersecurity

Data Mining for Cyber Security

An analysis of suitable parameters for efficiently applying K-means clustering to large TCPdump data set using Hadoop framework

Protecting Against Cyber Threats in Networked Information Systems

Network packet payload analysis for intrusion detection

CLUSTERING-BASED NETWORK INTRUSION DETECTION

VISUALIZE NETWORK ANOMALY DETECTION BY USING K-MEANS CLUSTERING ALGORITHM

Intrusion Detection Systems, Advantages and Disadvantages

Hybrid Intrusion Detection System Using K-Means Algorithm

Conclusions and Future Directions

Network Intrusion Detection Systems

Clustering as an add-on for firewalls

An Evaluation of Machine Learning Method for Intrusion Detection System Using LOF on Jubatus

AUTONOMOUS NETWORK SECURITY FOR DETECTION OF NETWORK ATTACKS

A new Approach for Intrusion Detection in Computer Networks Using Data Mining Technique

MINDS: A NEW APPROACH TO THE INFORMATION SECURITY PROCESS

ANALYSIS OF PAYLOAD BASED APPLICATION LEVEL NETWORK ANOMALY DETECTION

The Integration of SNORT with K-Means Clustering Algorithm to Detect New Attack

Hybrid Model For Intrusion Detection System Chapke Prajkta P., Raut A. B.

The Base Rate Fallacy and its Implications for the Difficulty of Intrusion Detection

A Review of Anomaly Detection Techniques in Network Intrusion Detection System

Traffic Anomaly Detection Using K-Means Clustering

Finding Frequent Itemsets using Apriori Algorihm to Detect Intrusions in Large Dataset

Development of a Network Intrusion Detection System

Using Data Mining Techniques for Detecting Terror-Related Activities on the Web

Detecting 0-day attacks with Learning Intrusion Detection System

Summarization - Compressing Data into an Informative Representation

Efficient Security Alert Management System

Network Intrusion Detection using Random Forests

A Survey on Intrusion Detection System with Data Mining Techniques

IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS PART C: APPLICATIONS AND REVIEWS, VOL. 38, NO. 5, SEPTEMBER

Performance Comparison between Backpropagation Algorithms Applied to Intrusion Detection in Computer Network Systems

WEB APPLICATION FIREWALL

Survey of Data Mining Approach using IDS

Network Intrusion Detection Using an Improved Competitive Learning Neural Network

Data Mining Project Report. Document Clustering. Meryem Uzun-Per

A Novel Approach for Network Traffic Summarization

Detection. Perspective. Network Anomaly. Bhattacharyya. Jugal. A Machine Learning »C) Dhruba Kumar. Kumar KaKta. CRC Press J Taylor & Francis Croup

Social Media Mining. Data Mining Essentials

A Review on Hybrid Intrusion Detection System using TAN & SVM

NETWORK-BASED INTRUSION DETECTION USING NEURAL NETWORKS

A survey on Data Mining based Intrusion Detection Systems

Evaluating Cyber Security Tools

Speedy Signature Based Intrusion Detection System Using Finite State Machine and Hashing Techniques

A Survey on Outlier Detection Techniques for Credit Card Fraud Detection

Robust Preprocessing and Random Forests Technique for Network Probe Anomaly Detection

Data Mining for Network Intrusion Detection

Alarm Clustering for Intrusion Detection Systems in Computer Networks

ARTIFICIAL INTELLIGENCE (CSCU9YE) LECTURE 6: MACHINE LEARNING 2: UNSUPERVISED LEARNING (CLUSTERING)

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

SOME CLUSTERING ALGORITHMS TO ENHANCE THE PERFORMANCE OF THE NETWORK INTRUSION DETECTION SYSTEM

Salvatore J. Stolfo 606 CEPSR

: Introduction to Machine Learning Dr. Rita Osadchy

Non-stationary data mining: the Network Security issue

Security Measures in Data Mining

Data Mining Clustering (2) Sheets are based on the those provided by Tan, Steinbach, and Kumar. Introduction to Data Mining

A Survey on Intrusion Detection using Data Mining Technique

Combining Heterogeneous Classifiers for Network Intrusion Detection

A Survey of Intrusion Detection System Using Different Data Mining Techniques

A SURVEY ON GENETIC ALGORITHM FOR INTRUSION DETECTION SYSTEM

Machine Learning using MapReduce

Web Forensic Evidence of SQL Injection Analysis

International Journal of Computer Science Trends and Technology (IJCST) Volume 3 Issue 3, May-June 2015

Classification Algorithms in Intrusion Detection System: A Survey

IDS IN TELECOMMUNICATION NETWORK USING PCA

MACHINE LEARNING & INTRUSION DETECTION: HYPE OR REALITY?

Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining

Combating Imbalance in Network Intrusion Datasets

Intrusion Detection using Artificial Neural Networks with Best Set of Features

False Positives Reduction Techniques in Intrusion Detection Systems-A Review

Anomaly Exposure in Network Traffic and its collision: a Survey

Unsupervised Anomaly Detection in Network Intrusion Detection Using Clusters

Data Mining for Digital Forensics

A REVIEW OF MACHINE LEARNING BASED ANOMALY DETECTION. By Mohamed Elfadly

Cluster Analysis. Alison Merikangas Data Analysis Seminar 18 November 2009

Mata : Garuda An advanced Network Monitoring System The S.L.A.D Network Security Framework. FIRST Conference Berlin, 19 June 2015

Secure Because Math: Understanding ML- based Security Products (#SecureBecauseMath)

How To Prevent Network Attacks

Taxonomy of Intrusion Detection System

Reference Books. Data Mining. Supervised vs. Unsupervised Learning. Classification: Definition. Classification k-nearest neighbors

Authentication Anomaly Detection: A Case Study On A Virtual Private Network

Application of Data Mining Techniques in Intrusion Detection

Intrusion Detection. Jeffrey J.P. Tsai. Imperial College Press. A Machine Learning Approach. Zhenwei Yu. University of Illinois, Chicago, USA

Revisit Network Anomaly Ranking in Datacenter Network Using Re-ranking

Euclidean-based Feature Selection for Network Intrusion Detection

Detection Approaches. Chapter Misuse Detection

Clustering. Danilo Croce Web Mining & Retrieval a.a. 2015/201 16/03/2016

Handling Intrusion Detection System using Snort Based Statistical Algorithm and Semi-supervised Approach

KEITH LEHNERT AND ERIC FRIEDRICH

Intrusion Detection Using Data Mining Techniques

On Entropy in Network Traffic Anomaly Detection

Some Research Challenges for Big Data Analytics of Intelligent Security

Denial of Service Attack Detection Using Multivariate Correlation Information and Support Vector Machine Classification

Hybrid Model for Intrusion Detection using Data Mining Techniques

Index Terms: Intrusion Detection System (IDS), Training, Neural Network, anomaly detection, misuse detection.

Transcription:

Data Clustering for Anomaly Detection in Network Intrusion Detection Jose F. Nieves Polytechnic University of Puerto Rico Research Alliance in Math and Science Dr. Yu (Cathy) Jiao Applied Software Engineering Research Oak Ridge National Laboratory August 2009

Outline Introduction and motivation Background Network intrusion detection Anomaly detection Data clustering Performance evaluation Summary 2 Managed by UT-Battelle

Introduction and motivation Figure 1. Computer incidents report (CERT/CC) Figure 2. Worm spread after 30 min www.caida.org Network intrusions pose a high security risk NIDS aim to identify intrusions NIDS capable of detecting new emerging threats Main goal Present different methods for network intrusion detection Select one method and evaluate its performance 3 Managed by UT-Battelle

Network intrusion detection methods Signature-based Based on signatures of known attacks Signature database has to be manually updated Expensive and time consuming Cannot detect new emerging threats Misuse detection Models built from labeled data sets Samples labeled as 'normal' or 'attack' Expensive and time consuming Can not detect new threats Anomaly detection Identify anomalies, deviations from 'normal' behavior Can detect new emerging threats Potential false alarm rate 4 Managed by UT-Battelle

Anomaly detection Supervised methods Models built from 'normal' labeled samples Expensive and time consuming Unsupervised method (data clustering) Group unlabeled patterns into clusters based on similarities Do not required labeled samples Anomalies are deviation from 'normal' clusters 5 Managed by UT-Battelle Figure 3. Clustering for anomaly detection

Basic steps for data clustering Pattern representation (normalization, feature selection/extraction) Similarity measure (Euclidean) Grouping or clustering Figure 4. Data clustering steps 6 Managed by UT-Battelle

Algorithm Kmeans Choose number of clusters Initialize centroids or clusters centers Assign each sample to nearest cluster centroid Calculate means to be new centroid of each cluster 7 Managed by UT-Battelle Figure 5. Kmeans clustering algorithm

Software Cluto (University of Minnesota) Clustering toolkit Simple interface 3-D cluster output visualization 8 Managed by UT-Battelle Figure 6. Cluto software for data clustering

Data set Kddcup 1999 Based on the Darpa 1998 data set from MIT Lincoln Lab Original data set Total samples = 400,000 Total features = 41 (categorical and continuous values) Types of attacks: DoS, Probe, U2R, R2L Data set used Total samples = 9,200 Categorical features encoded to binary values Total features = 80 (continuous and binary values) Attacks 2% of samples 9 Managed by UT-Battelle

Performance metrics Detection rate: detected attacks / total attacks False alarm rate: normal classify as attacks / total normal 10 Managed by UT-Battelle

Performance evaluation Figure 7. Clustering visualization for K = 10 using Cluto Smaller clusters labeled as 'attacks' Big clusters labeled as 'normal' 11 Managed by UT-Battelle

Performance evaluation K = 10 (7 sec) 3 clusters 5 clusters 0.56 4.8 76 89 0 10 20 30 40 50 60 70 80 90 100 K = 20 (11 sec) K = 30 (14 sec) 8 clusters 10 clusters 9 clusters 11 clusters 65 1.58 88.5 2.2 0 10 20 30 40 50 60 70 80 90 100 72 0.56 86.5 0.87 Detection rate False alarm rate Clusters Purity (0 1) K=10 0.989 K=20 0.995 K=30 0.997 12 Managed by UT-Battelle 0 10 20 30 40 50 60 70 80 90 100 Figure 8. Bar plots of detection and false alarm rate

Summary Network intrusion detection Methods Advantages and disadvantages Method used Data clustering for anomaly detection Advantages Evalutation results Increasing number of 'attacks' clusters affect the detection rate and false alarm rate High detection rate can be achieved with a low false alarm rate 13 Managed by UT-Battelle

References 1. Varun Chandola, Arindam Banerjee, and Vipin Kumar. Anomaly Detection: A survey. 2007 2. Leonid Portnoy, Eleazar Eskin, Sal Stolfo. Intrusion detection with unlabeled data using clustering. 2001 3. S Zhong, TM Khoshgoftaar, N Seliya. Clustering-based network intrusion detection. 2007 4. Richard O. Duda, Peter E. Hart, David G. Stork. Pattern Classification. 2001 5. Stefano Zanero, Sergio M. Savaresi. Unsupervised learning techniques for a intrusion detection system. 2004 6. Alexandar Lazarevic, Levent Ertoz, Vipin Kumar, Aysel Ozgur, Jaideep Srivastava. A Comparative study of anomaly detection schemes in network intrusion detection. 2003 7. Varun Chandola, Eric Eilertson, Leven Ertoz, Gyorgy Simon, Vipin Kumar. Data mining for cyber security. 2006 8. Anita K. Jones, Robert S. Sielken. Computer system intrusion detection: A survey. 2000 9. Shyam Boriah, Varun Chandola, Vipin Kumar. Similarity measures for categorical data. 2008 10. Stefan Axelsson. Intrusion detection systems: A survey and taxonomy. 2000 11. George Karypis. Cluto: A clustering toolkit. 2003 12. A.K. Jain, M.N. Murty, P.J. Flynn. Data clustering: A review. 1999 14 Managed by UT-Battelle

Acknowledgements Thanks to my mentor Cathy Jiao for her time and support. Thanks to RAMS and Debbie Mccoy for this opportunity. Thanks to all the interns for their friendship. 15 Managed by UT-Battelle

Thanks!! Questions? 16 Managed by UT-Battelle