1 ALERT CLASSIFICATION TO REDUCE FALSE POSITIVES IN INTRUSION DETECTION July 2006 Dissertation zur Erlangung des Doktorgrades der Fakultät für Angewandte Wissenschaften der Albert-Ludwigs-Universität Freiburg im Breisgau Tadeusz Pietraszek Institut für Informatik, Albert-Ludwigs-Universität Freiburg Georges-Köhler-Allee 52, Freiburg i. Br., Germany

2 Dekan: Erstreferent: Zweitreferent: Prof. Dr. Jan G. Korvink Prof. Dr. Luc De Raedt Prof. Dr. Johannes Fürnkranz Tag der Disputation:

3 A new star has been discovered, which doesn t mean that things have gotten brighter or that something we ve been missing has appeared.... Wis lawa Szymborska, Surplus [Szy00]


5 Erklärung Ich erkläre hiermit, dass ich die vorliegende Arbeit ohne unzulässige Hilfe Dritter und ohne Benutzung anderer als der angegebenen Hilfsmittel angefertigt habe. Die aus anderen Quellen direkt oder indirekt übernommenen Daten und Konzepte sind unter Angabe der Quelle gekennzeichnet. Insbesondere habe ich hierfür nicht die entgeltliche Hilfe von Vermittlungsoder Beratungsdiensten (Promotionsberaterinnen oder Promotionsberater oder anderer Personen) in Anspruch genommen. Niemand hat von mir unmittelbar oder mittelbar geldwerte Leistungen für Arbeiten erhalten, die im Zusammenhang mit dem Inhalt der vorgelegten Dissertation stehen. Die Arbeit wurde bisher weder im In- noch im Ausland in gleicher oder ähnlicher Form einer anderen Prüfungsbehörde vorgelegt. Desweitern habe ich mich nicht bereits und bewerbe ich mich auch nicht gleichzeitig an einer in- oder ausländischen wissenschaftlichen Hochschule um die Promotion. I hereby certify that the work embodied in this thesis is the result of original research and has not been submitted for a higher degree to any other university or institution. Zürich, Switzerland July 4, 2006 Tadeusz Pietraszek 5

7 Contents Acknowledgments Abstract Zusammenfassung List of Figures List of Tables v vii ix xi xiii 1 Introduction Motivation False Positives Existing Solutions Introducing the Analyst: The Global Picture of Alert Management Why Learning Alert Classifiers Works and Why It is a Difficult Learning Problem Classifying Alerts: False Positives, True Positives or Other Classes? Thesis Statement and Contributions Overview Intrusion Detection and Machine-Learning Background Intrusion Detection Intrusion Detection Systems Two examples of IDSs Conclusions Machine Learning Classification Basic Techniques Evaluating Classifiers ROC Analysis Unsupervised Techniques Summary State of the Art Multiple Facets of Related Work Building IDSs Using Machine Learning Spam Filtering i

8 ii CONTENTS 3.4 Interface Agents Alert Correlation Frequent Episodes & Association Rules Sensor Profiling CLARAty Data Mining and Root Cause Analysis Summary Datasets Used Datasets Available Datasets Used & Alert Labeling Alert Representation DARPA 1999 Data Set Data Set B MSSP Datasets Summary Adaptive Alert Classification ALAC Adaptive Learner for Alert Classification Recommender Mode Agent Mode Background Knowledge Choosing Machine-Learning Techniques Learning an Interpretable Classifier from Examples Background Knowledge and Efficiency Confidence of Classification Applying RIPPER to ALAC Cost-Sensitive and Binary vs. Multi-Class Classification Batch-Incremental Learning ALAC Evaluation Evaluation Methodology Background Knowledge Results Obtained with DARPA 1999 Data Set Results Obtained with Data Set B Understanding the Rules Conclusions Summary Abstaining Classifiers using ROC Analysis Introduction Background ROC-Optimal Abstaining Classifier Cost-Based Model Bounded Models Bounded-Abstention Model Bounded-Improvement Model Experiments Constructing an Abstaining Classifier

9 CONTENTS iii Testing Methodology Results Cost-Based Model Results Bounded Models Alternative Representations to ROC Curves Precision-Recall and ROC Curves DET Curves Cost Curves Related Work Conclusions and Future Work ALAC+ An Alert Classifier with Abstaining Classifiers ALAC Meets with Abstaining Classifiers The Problem with Rule Learners ALAC+ Evaluation Choosing Evaluation Models for ALAC Setting System Parameters Cost Results Conclusions Summary Combining Unsupervised and Supervised Learning Why Unsupervised Learning Makes Sense Retrospective Alert Analysis Subsequent Alert Classification CLARAty Algorithm Description Generalization Hierarchies CLARAty Algorithm Cluster Descriptions and Filtering Automated Cluster-Processing System CLARAty Evaluation Evaluation Methodology Setting System Parameters Cluster Persistency Number of Clusters and Total Coverage Automated Cluster Processing Cluster Precision and Recall Clustering Precision and Recall Charts Conclusions Combining Clustering with ALAC in a Two-Stage Alert-Classification System CLARAty and ALAC Evaluation ROC analysis DARPA 1999 Data Set Data Set B MSSP Datasets Conclusions Summary

10 iv CONTENTS 9 Summary, Conclusions and Future Work Summary Conclusions Future Work A Alert Correlation 151 A.1 Correlation Terminology A.2 Alert Correlation Systems A.2.1 Tivoli Aggregation and Correlation Component A.2.2 Probabilistic Alert Correlation A.2.3 Alert-Stream Fusion A.2.4 Hyper-alert Correlation A.2.5 Cooperative Intrusion Detection Framework A.2.6 Correlated Hacking Behavior A.2.7 M2D2 Formal Data Model A.2.8 Statistical Correlation Models A.2.9 Comprehensive IDS Alert Correlation B Abstaining Classifier Evaluation Results 161 C Clustering MSSP Datasets Results 173 Bibliography 184 Table of Symbols 199 Index 201

11 Acknowledgments efending a PhD is a one-man show, however, the process of pursuing one is definitely D not a one-person effort and I have a lot of people to thank for helping me in this stage of my life. First of all, I would like to thank my professor, Luc De Raedt, who saw value in my research and agreed to supervise it, providing support and giving directions for research. Being a remote PhD student working at IBM Zurich Research Lab is a special situation and I am really grateful that such an arrangement was possible. For all this and more, thank you, Luc. I would also like to thank Andreas Wespi, my former manager and mentor at IBM, for hiring me and supporting me during my PhD quest, always finding time for meetings and very thoroughly scrutinizing my work. I had greatly benefited from his experience in the field of intrusion detection and computer security. I would also like to thank Lucas Heusler, my current manager for giving me a lot of flexibility in doing my research, making the finishing of my PhD possible. During my PhD work I was greatly supported by my mentor, Klaus Julisch, who spent a considerable amount of time explaining the arcane of scientific work, forcing me to write and tirelessly correcting my scribbles. He also never hesitated to ask those difficult questions, which helped me to become a more mature researcher. I would like to thank the IBM Global Services Managed Security Services team, in particular Mike Fiori and Chris Calvert for allowing me to use their data and Jim Treinen and Ken Farmer for providing support on the technical side. I would also like to thank my friends at IBM, James Riordan & Daniela Bourges-Waldegg for being great friends, expanding my horizons (both scientific and non-scientific) through always interesting discussions and giving me (and other PhD students) a motto The goal of PhD is to finish it. I have had a great time with Diego Zamboni who, in spite of (or maybe rather because of) thinking of me as a very apt procrastinator and giving me motivation to work on my numerous pet projects, always found time to say Tadek, work on your thesis. During my stay in the lab, I have also met many interesting friends and colleagues: Chris Giblin, Marcel Graf, Christian Hörtnagl, Ulf Nielsen, Mike Nidd, René Pawlitzek, Ulrich Schimpel, Morton Schwimmer, Abhi Shelat, Dieter Sommer, Axel Tanner, and others, who provided an excellent and stimulating working environment and always had time for interesting discussions. Among my colleagues, special thanks go to my office-mate, Chris Vanden Berghe for putting up with me in one office during these three years, always-interesting discussions and arguments, and many interesting ideas that got born this way. I am also grateful to the friendly people who volunteered to read through, and give me invaluable comments on this dissertation and its earlier versions: my professor Luc De Raedt, Birgit Baum-Waidner, Axel Tanner (also for the help with the German abstract) and Andreas v

12 vi Preface Wespi. Without your help this thesis would not have gotten to this stage. Clearly, I am solely responsible for any mistakes that had remained in the report. Last but not least, I am deeply indebted to my family for their everlasting support while abroad and having by far more faith in me than anybody else, including myself! My special thanks go to Annie for being the best girlfriend and a wonderful life companion and for putting up with me during the hectic time while working on my PhD. Zürich, Switzerland July 4, 2006 Tadeusz Pietraszek

13 Abstract Intrusion Detection Systems (IDSs) aim at detecting intrusions, that is actions that attempt to compromise the confidentiality, integrity and availability of computer resources. With the proliferation of the Internet and the increase in the number of networked computers, coupled with the surge of unauthorized activities, IDSs have become an integral part of today s security infrastructures. However, in real environments IDSs have been observed to trigger an abundance of alerts. Most of them are false positives, i.e., alerts not related to security incidents. This dissertation deals with the problem of false positives in intrusion detection. We propose the novel concept of training an alert classifier using a human analyst s feedback and show how to build an efficient alert classifier using machine-learning techniques. We analyze the desired properties of such a system from the domain perspective and introduce ALAC, an Adaptive Learner for Alert Classification, and its two modes of operation: a recommender mode, in which all alerts with their classification are forwarded to the analyst, and an agent mode, in which the system uses autonomous alert processing. We evaluate ALAC in both modes on real and synthetic intrusion detection datasets and obtain promising results: In our experiments ALAC reduced the number of false positives by up to 60% with acceptable misclassification rates. Abstaining classifiers are classifiers that in certain cases can refrain from classification, which is similar to a domain expert saying I don t know. Abstaining classifiers are advantageous over normal classifiers if they perform better than normal classifiers when they make a decision. In this dissertation we provide a clarification of the concept of optimal abstaining classifiers and introduce three different models, in which normal and abstaining classifiers can be compared: the cost-based model, the bounded-abstention model, and the bounded-improvement model. In the first cost-based model, the classifier uses an extended 2 3 cost matrix, whereas in the bounded models, the classifier uses a standard 2 2 cost matrix and boundary conditions: the abstention window or the desired cost improvement. Looking at a common type of abstaining classifiers, namely classifiers constructed from a single ROC curve, we provide efficient algorithms for selecting these classifiers optimally in each of these models. We perform an experimental validation of these methods on a variety of common benchmark datasets. Applying abstaining classifiers to ALAC, we introduce ALAC+, an extension of our alertclassification system. We select the most suitable abstaining classifier models and show that by using abstaining classifiers one can significantly reduce the misclassification cost. For example, in our experiments with a 10% abstention the system reduced the overall misclassification cost by up to 87%. This makes abstaining classifiers particularly suitable for alert classification. vii

14 viii Preface In the final part of this dissertation, we extend CLARAty, the state-of-the-art alert clustering system by introducing automated cluster processing, and show how the system can be used to investigate missed intrusions and correct initial analyst s classifications. Based on this, we build a two-stage alert-classification system in which alerts are processed by the automated cluster-processing system and then forwarded to ALAC. Our experiments with real and synthetic datasets showed that the automated cluster-processing system is robust and on average reduces the total number of alerts by 63% which further reduces the analyst s workload.

15 Zusammenfassung Eindringerkennungssysteme (Intrusion Detection Systems, abgekürzt IDSs) zielen auf die Erkennung von Angriffen, d.h. Aktionen, die versuchen die Konfidenzialität, Integrität und Verfügbarkeit von Computer-Resourcen zu kompromittieren. Durch das enorme Wachstum des Internets und der Zahl der vernetzten Computer bei gleichzeitiger starker Zunahme von nicht-autorisierten Aktivitäten sind IDSs zu einem integralen Bestandteil der typischen aktuellen Sicherheits-Infrastruktur geworden. In realen Umgebungen beobachtet man jedoch, daß IDSs sehr viele Alarme produzieren, dabei zu einem großen Teil auch Fehlalarme (false positives), d.h. Alarme, die keinen Sicherheits-Zwischenfällen entsprechen. Diese Dissertation beschäftigt sich mit dem Problem von Fehlalarmen in der Intrusion Detektion. Wir schlagen hierzu ein neuartiges Konzept vor, bei dem ein Alarm-Klassifizierer aus der Rückmeldung eines menschlichen Analysten lernen kann, und zeigen, wie ein solcher effizienter Alarm-Klassifizierer mit Hilfe der Techniken maschinellen Lernens erstellt werden kann. Wir analysieren die wünschenswerten Eigenschaften eines solchen Systems aus dem Blickwinkel der Domäne der Intrusion Detektion und stellen ALAC vor, den Adaptiven Lerner für Alarm- Klassifikation (Adaptive Learner for Alert Classification). ALAC hat zwei Betriebsarten: eine empfehlende Betriebsart (recommender mode), bei der alle Alarme mit ihrer Klassifikation an den Analysten weitergeleitet werden, und eine Betriebsart als Agent (agent mode), in welcher das System Alarme teilweise eigenständig verarbeitet. Wir evaluieren ALAC in beiden Modi mit realen und synthetischen Daten aus dem Gebiet der Intrusion Detektion und erhalten dabei viel versprechende Ergebnisse: ALAC reduziert in diesen Experimenten die Zahl der Fehlalarme um bis zu 60% bei annehmbaren Raten der Fehlklassifikation. Sich-enthaltende Klassifizierer (abstaining classifiers) nehmen in bestimmten Fällen keine Klassifizierung vor, ähnlich einem Ich weiß nicht eines Domain-Experten. Es besteht die Annahme, daß ein solcher Klassifizierer, der sich enthalten kann, insgesamt eine bessere Leistung bringen kann als normale Klassifizierer, die in jedem Fall eine Entscheidung treffen müssen. In dieser Dissertation klären wir das Konzept des optimalen sich-enthaltenden Klassifizierers und stellen drei verschiedene Modelle vor, in denen sie mit normalen Klassifizierern verglichen werden können: ein kosten-basiertes Modell, ein Modell mit begrenzter Enthaltung und ein Modell mit begrenzter Verbesserung. Im kosten-basierten Modell benutzt der Klassifizierer eine erweiterte 2 3 Kosten-Matrix, während in den anderen Modellen der Klassifizierer eine normale 2 2 Kosten-Matrix verwendet mit zusätzlichen Randbedingungen: der Menge der Alarme, bei denen sich der Klassifizierer enthält, beziehungsweise die gewünschte Verbesserung der Kosten. Für eine übliche Gruppe von sich-enthaltenden Klassifizierern, die ix

16 x Preface aus einer einzelnen ROC-Kurve hervorgehen, zeigen wir effiziente Algorithmen um diese Klassifizierer in optimaler Art auszuwählen in allen genannten Modellen. Diese Methoden werden experimentell bestätigt mit einer großen Zahl von Benchmark-Daten. Unter Anwendung von sich-enthaltenden Klassifizierern auf ALAC führen wir ALAC+ ein, eine Erweiterung unseres Alarm-Klassifikations-Systems. Wir wählen die am besten geeigneten sich-enthaltenden Klassifizierer und zeigen, daß dadurch die Fehlklassifikations-Kosten signifikant reduziert werden können. So reduzieren sich beispielsweise in unseren Experimenten bei 10% Enthaltung die allgemeinen Fehlklassifikations-Kosten um bis zu 87%. Dies macht sich-enthaltende Klassifizierer besonders geeignet für die Alarm-Klassifizierung. Im letzten Teil der Arbeit erweitern wir CLARAty, ein aktuelles Alarm-Clustering-System, durch die Einführung einer automatisierten Cluster-Verarbeitung und zeigen, wie das System dazu benutzt werden kann eventuell übersehene Angriffe zu untersuchen und initiale Klassifikationen eines Analysten zu korrigieren. Hierauf aufbauend entwickeln wir ein zweistufiges Alarm-Klassifikations-System, in welchen Alarme zuerst durch die automatisierte Cluster- Verarbeitung prozessiert und dann an ALAC weitergeleitet werden. Unsere Experimente mit realen und synthetischen Daten zeigen, daß das automatisierte Cluster-Verarbeitungs-System robust ist und die Gesamtzahl von Alarmen, und damit auch die Arbeitslast des Analysten, durchschnittlich um 63% reduziert.

17 List of Figures 1.1 Evolution of the scope for addressing false positives in intrusion detection. Shaded areas represent the scope discussed in the text The global picture of alert management Thesis outline The general architecture of an IDS (based on [Axe05]) Using CSSE to preserve the metadata of string representations and to allow late string evaluation. Shaded areas represent string fragments originating from the user A sample decision tree Examples of ROC and ROCCH curves and the cost-optimal classifier Multiple facets of related work Entity-relationship diagram of concepts used by CLARAty [Jul03b] Architecture of ALAC in agent and recommender modes Three types of background knowledge for classifying IDS alerts ROC curves for the base classifier used with different types of background knowledge. The fragments represent areas of practical interest (low falsepositive rates and high true-positive rates) False negatives and false positives for ALAC in agent and recommender modes (DARPA1999 dataset, w = 50) Number of alerts processed autonomously by ALAC in agent mode False negatives and false positives for ALAC in agent and recommender modes (Data Set B, w = 50) ROC performance for algorithms inducing rules for different classes: + and Abstaining classifier A α,β constructed using two classifiers C α and C β Optimal classifier paths in a bounded-abstention model Finding the optimal classifier in a bounded model: visualization of X Optimal classifier paths in a bounded-improvement model Building an abstaining classifier A α,β Cost-based model: Relative cost improvement and fraction of nonclassified instances for a representative dataset ( : CR = 0.5, : CR = 1, : CR = 2) Bounded-abstention model: Relative cost improvement and the absolute cost for one representative dataset ( : CR = 0.5, : CR = 1, : CR = 2) xi

18 xii LIST OF FIGURES 6.8 Bounded-improvement model: Fraction of nonclassified instances for a representative dataset ( : CR = 0.5, : CR = 1, : CR = 2) Conversion between sample ROC and P-R curves (N/P = 5) Conversion between sample P-R and ROC curves (N/P = 5). The ROCCH has been transferred back to the P-R curve Conversion between sample ROC and DET curves. Grid shows iso-cost lines at CR = 2 and Simplified architecture of ALAC with abstaining classifiers Classifiers for three different misclassification costs ICR = 1, ICR = 50 (used in the remaining experiments) and ICR = 200 (DARPA 1999, BA 0.1) Three main types of clusters: false-alert candidates, true-alert candidates, and mixed clusters for further analysis Semi-automated cluster processing Sample generalization hierarchies for address, port and time attributes Automated cluster processing creating features Automated cluster processing filtering The evaluation of alert clustering and filtering Cluster persistency for DARPA 1999 Data Set and Data Set B relative and absolute values. Arrows show cumulative cluster coverage in the clustering (begin of an arrow) and the filtering (end of an arrow) stages for individual clustering runs Estimating the fraction of alerts clustered and the fraction of alerts filtered as a function of the number of clusters learned. Curves correspond to individual clustering runs. Verticals line show the smallest argument for which the target function reaches 95% of its maximum value Clusters as filters for DARPA 1999 Data Set and Data Set B relative and absolute values. Missed positives in Figures 8.9b and 8.9d are calculated relative to the number of true alerts (P ) Total alert reduction for clusters as filters for all datasets relative and absolute values Clustering and filtering precision and recall for DARPA 1999 Data Set. Data shown cumulatively for all clustering runs, with FA-clusters suppressed Cluster 72194, describing a part of a portsweep attack ROC curves for two types of two-stage alert-classification systems: 2FC and 2FI, for DARPA 1999 Data Set and Data Set B Two-stage alert-classification system: False negatives and false positives for ALAC and two-stage ALAC (2FC, 2FI) in agent and recommender modes (DARPA1999 Data Set, ICR =50) Two-stage alert-classification system: Number of alerts processed autonomously by ALAC and two-stage ALAC (2FC, 2FI) in agent mode Two-stage alert-classification system: False negatives and false positives for ALAC and two-stage ALAC (2FC, 2FI) in agent and recommender modes (Data Set B, ICR =50) ROC curve for sample MSSP datasets

19 LIST OF FIGURES xiii B.1 Cost-Based Model: Experimental results with abstaining classifiers relative cost improvement B.2 Cost-based model: Experimental results with abstaining classifiers fraction of skipped instances B.3 Bounded model: Experimental results with abstaining classifiers relative cost improvement B.4 Bounded model: Experimental results with abstaining classifiers absolute cost values B.5 Expected improvement model: Experimental results with abstaining classifiers desired relative cost improvement vs. fraction of nonclassified instances B.6 Expected improvement model: Experimental results with abstaining classifiers desired absolute cost improvement vs. fraction of nonclassified instances B.7 ALAC+, DARPA 1999 Data Set, BA0.1: False-positive rates, false-negative rates, the abstention window and the fraction of discarded alerts in both agent and recommender modes B.8 ALAC+, Data Set B, BA0.1: False-positive rates, false-negative rates, the abstention window and the fraction of discarded alerts in both agent and recommender modes B.9 ALAC+, DARPA 1999 Data Set, BI0.5: False-positive rates, false-negative rates, the abstention window and the fraction of discarded alerts in both agent and recommender modes B.10 ALAC+, Data Set B, BI0.5: False-positive rates, false-negative rates, the abstention window and the fraction of discarded alerts in both agent and recommender modes C.1 Cluster persistency for 20 MSSP customers absolute values. X and Y axes labels are the same as in Figs. 8.7a and 8.7c C.2 Cluster persistency for 20 MSSP customers relative values. X and Y axes labels are the same as in Figs. 8.7b and 8.7d C.3 Estimating the fraction of instances clustered as a function of the number of clusters learned for 20 MSSP customers. X and Y axes labels are the same as in Figs. 8.8a and 8.8c C.4 Estimating the fraction of instances clustered as a function of the fraction of instances filtered for 20 MSSP customers. X and Y axes labels are the same as in Figs. 8.8b and 8.8d C.5 Cluster filtering for 20 MSSP customers absolute values. X and Y axes labels are the same as in Figs. 8.9a and 8.9c C.6 Cluster filtering for 20 MSSP customers relative values. X and Y axes labels are the same as in Figs. 8.9b and 8.9d C.7 Clustering precision for 20 MSSP customers clustering stage. X and Y axes are the same as in Fig. 8.11a C.8 Clustering precision for 20 MSSP customers filtering stage. X and Y axes are the same as in Fig. 8.11b C.9 Clustering recall for 20 MSSP customers clustering stage. X and Y axes are the same as in Fig. 8.11c C.10 Clustering recall for 20 MSSP customers filtering stage. X and Y axes are the same as in Fig. 8.11d


False Positives Reduction Techniques in Intrusion Detection Systems-A Review

False Positives Reduction Techniques in Intrusion Detection Systems-A Review 128 False Positives Reduction Techniques in Intrusion Detection Systems-A Review Asieh Mokarian, Ahmad Faraahi, Arash Ghorbannia Delavar, Payame Noor University, Tehran, IRAN Summary During the last decade

More information


MANAGED SECURITY SERVICES (MSS) MANAGED SECURITY SERVICES (MSS) THE CYBER SECURITY INITIATIVE. Cybercrime is becoming an important factor for CIOs and IT professionals, but also for CFOs, compliance officers and business owners. The

More information