Kickoff: Anomaly Detection Challenges



Similar documents
CS 2750 Machine Learning. Lecture 1. Machine Learning. CS 2750 Machine Learning.

Anomaly detection. Problem motivation. Machine Learning

System Specification. Author: CMU Team

: Introduction to Machine Learning Dr. Rita Osadchy

AUTO CLAIM FRAUD DETECTION USING MULTI CLASSIFIER SYSTEM

Agreement on. Dual Degree Master Program in Computer Science KAIST. Technische Universität Berlin

Example application (1) Telecommunication. Lecture 1: Data Mining Overview and Process. Example application (2) Health

Graduate School of Informatics

Information Management course

DATA MINING FOR BUSINESS INTELLIGENCE. Data Mining For Business Intelligence: MIS 382N.9/MKT 382 Professor Maytal Saar-Tsechansky

MACHINE LEARNING & INTRUSION DETECTION: HYPE OR REALITY?

Efficient Security Alert Management System

OUTLIER ANALYSIS. Data Mining 1

Insider Threat Detection Using Graph-Based Approaches

Dan French Founder & CEO, Consider Solutions

Data Mining and Exploration. Data Mining and Exploration: Introduction. Relationships between courses. Overview. Course Introduction

An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015

Steven C.H. Hoi School of Information Systems Singapore Management University

An Evaluation of Machine Learning Method for Intrusion Detection System Using LOF on Jubatus

COMP3420: Advanced Databases and Data Mining. Classification and prediction: Introduction and Decision Tree Induction

CPSC 340: Machine Learning and Data Mining. Mark Schmidt University of British Columbia Fall 2015

Azure Machine Learning, SQL Data Mining and R

Computational intelligence in intrusion detection systems

Application of Data Mining Techniques in Intrusion Detection

Anomaly and Fraud Detection with Oracle Data Mining 11g Release 2

Secure Because Math: Understanding ML- based Security Products (#SecureBecauseMath)

Classification and Prediction

Knowledge Discovery from Data Bases Proposal for a MAP-I UC

Credit Card Fraud Detection Using Self Organised Map

TIETS34 Seminar: Data Mining on Biometric identification

This translation is provided for convenience only; in case of discrepancy, the German version shall prevail.

DATA MINING TECHNOLOGY. Keywords: data mining, data warehouse, knowledge discovery, OLAP, OLAM.

Subject Examination and Academic Regulations for the Research on Teaching and Learning Master s Programme at the Technische Universität München

Government of Russian Federation. Faculty of Computer Science School of Data Analysis and Artificial Intelligence

Intrusion Detection via Machine Learning for SCADA System Protection

Practical Data Science with Azure Machine Learning, SQL Data Mining, and R

Healthcare Measurement Analysis Using Data mining Techniques

Adaptive Anomaly Detection for Network Security

Profit from Big Data flow. Hospital Revenue Leakage: Minimizing missing charges in hospital systems

Data Mining Carnegie Mellon University Mini 2, Fall Syllabus

Using Predictive Analytics to Detect Contract Fraud, Waste, and Abuse Case Study from U.S. Postal Service OIG

A Survey on Outlier Detection Techniques for Credit Card Fraud Detection

Introduction. A. Bellaachia Page: 1

SURVEY OF INTRUSION DETECTION SYSTEM

Data Mining Application for Cyber Credit-card Fraud Detection System

Maschinelles Lernen mit MATLAB

Introduction to Data Mining

Statistics W4240: Data Mining Columbia University Spring, 2014

AMIS 7640 Data Mining for Business Intelligence

Data Mining and Business Intelligence CIT-6-DMB. Faculty of Business 2011/2012. Level 6

MA2823: Foundations of Machine Learning

Domain Classification of Technical Terms Using the Web

Machine Learning and Data Mining. Fundamentals, robotics, recognition

Data Mining for Security Applications

Subject Description Form

Spam detection with data mining method:

Using Artificial Intelligence in Intrusion Detection Systems

Integration Misuse and Anomaly Detection Techniques on Distributed Sensors

Discover Viterbi: New Programs in Computer Science

Using Random Forest to Learn Imbalanced Data

Evaluating Online Payment Transaction Reliability using Rules Set Technique and Graph Model

The Integration of SNORT with K-Means Clustering Algorithm to Detect New Attack

Data Mining Solutions for the Business Environment

Textbooks: Matt Bishop, Introduction to Computer Security, Addison-Wesley, November 5, 2004, ISBN

Knowledge-based systems and the need for learning

International Journal of Computer Science Trends and Technology (IJCST) Volume 3 Issue 3, May-June 2015

A Survey on Intrusion Detection System with Data Mining Techniques

Orientation Program for Students of Our MSc. Programs Business Administration, Economics and MEMS. Information Systems. Prof. Dr.

Detecting Anomalies in Network Traffic Using Maximum Entropy Estimation

Web Mining Seminar CSE 450. Spring 2008 MWF 11:10 12:00pm Maginnes 113

Course 395: Machine Learning

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014

Online Credit Card Application and Identity Crime Detection

BUSINESS INTELLIGENCE WITH DATA MINING FALL 2012 PROFESSOR MAYTAL SAAR-TSECHANSKY

ECE 697J Advanced Topics in Computer Networking

CSci 538 Articial Intelligence (Machine Learning and Data Analysis)

FRAUD DETECTION IN ELECTRIC POWER DISTRIBUTION NETWORKS USING AN ANN-BASED KNOWLEDGE-DISCOVERY PROCESS

Introduction to Engineering Using Robotics Experiments Lecture 17 Big Data

Observation and Findings

Utility-Based Fraud Detection

CSC574 - Computer and Network Security Module: Intrusion Detection

Network Intrusion Detection using Data Mining Technique

Identity Theft Prevention Program Compliance Model

Data Mining Applications in Higher Education

Speaker: Prof. Mubarak Shah, University of Central Florida. Title: Representing Human Actions as Motion Patterns

The Cyber Threat Profiler

The Data Mining Process

Data Mining System, Functionalities and Applications: A Radical Review

Fuzzy Network Profiling for Intrusion Detection

A survey on Data Mining based Intrusion Detection Systems

Endpoint Threat Detection without the Pain

Transcription:

Kickoff: Anomaly Detection Challenges A Practical Course in SS2014 Huang Xiao Han Xiao Chair of IT Security (I20) Department of Informatics Technische Universität München January 31, 2014 Huang Xiao, Han Xiao (Technische Universität München) Kickoff: Anomaly Detection Challenges January 31, 2014 1 / 17

Overview Motivation 1 Motivation 2 How to challenge 3 Evaluation 4 References uang Xiao, Han Xiao (Technische Universität München) Kickoff: Anomaly Detection Challenges January 31, 2014 2 / 17

Motivation What is Anomaly Detection Definition Anomaly Detection is a process of discovering patterns in data which do not comply with their expected behavior. Similar terms can also be referred as outlier detection, novelty detection and so on. uang Xiao, Han Xiao (Technische Universität München) Kickoff: Anomaly Detection Challenges January 31, 2014 3 / 17

Motivation What is Anomaly Detection Definition Anomaly Detection is a process of discovering patterns in data which do not comply with their expected behavior. Similar terms can also be referred as outlier detection, novelty detection and so on. Anomalies are... Rare Harmful Confusing *NOT* noises Huang Xiao, Han Xiao (Technische Universität München) Kickoff: Anomaly Detection Challenges January 31, 2014 3 / 17

Motivation Curse of Anomalies Anomalous behaviors intend to compromise a system or service by maximizing certain interest. Fraudulent transactions of credit cards caused tremendous financial lost per year. Suspicious MRI images are possibly indicative of malicious existence of tumor. Anomalous network traffic measurement during a certain period might indicate a network hacking-through Unusual noises in motorcycle may also refer to some damage of the engine which could be fatal. Huang Xiao, Han Xiao (Technische Universität München) Kickoff: Anomaly Detection Challenges January 31, 2014 4 / 17

Motivation Curse of Anomalies Anomalous behaviors intend to compromise a system or service by maximizing certain interest. Fraudulent transactions of credit cards caused tremendous financial lost per year. Suspicious MRI images are possibly indicative of malicious existence of tumor. Anomalous network traffic measurement during a certain period might indicate a network hacking-through Unusual noises in motorcycle may also refer to some damage of the engine which could be fatal. We need to do something with the anomalies. uang Xiao, Han Xiao (Technische Universität München) Kickoff: Anomaly Detection Challenges January 31, 2014 4 / 17

Motivation General Course Information Type Practical Course (Praktikum) Credits 6 SWS / 10,0 ECTS-Credits Time Di, 14:00 to 15:30 Uhr Start-End Start from 08.04.2014, ends at 08.07.2014 Where Lab room 01.05.013 Advisors Huang Xiao & Han Xiao Language English Required Registered Master or Diplom of Informatik at TUM Home page http://ml.sec.in.tum.de/adcg/ Website of Chair http://www.sec.in.tum.de/ Huang Xiao, Han Xiao (Technische Universität München) Kickoff: Anomaly Detection Challenges January 31, 2014 5 / 17

Overview How to challenge 1 Motivation 2 How to challenge 3 Evaluation 4 References uang Xiao, Han Xiao (Technische Universität München) Kickoff: Anomaly Detection Challenges January 31, 2014 6 / 17

How to challenge Objective We aim at... providing challengers a set of learning tasks, in which they are assigned with a certain data set including some anomalies. In the learning tasks, challengers are about to detect those anomalies using their own proposed methods. That is Anomaly detecion in teams Assigned data sets Apply own algorithms Benchmarks on data sets Ranking of detection performance uang Xiao, Han Xiao (Technische Universität München) Kickoff: Anomaly Detection Challenges January 31, 2014 7 / 17

Process How to challenge Huang Xiao, Han Xiao (Technische Universität München) Kickoff: Anomaly Detection Challenges January 31, 2014 8 / 17

Process How to challenge 1. Team up + Task assignment Team up with max. 2 persons, and we assign a well designed data set to all the teams. Huang Xiao, Han Xiao (Technische Universität München) Kickoff: Anomaly Detection Challenges January 31, 2014 8 / 17

Process How to challenge 1. Team up + Task assignment Team up with max. 2 persons, and we assign a well designed data set to all the teams. 2. Do your homework Apply your own algorithms, e.g., statistics based, machine learning based, on the data set to find anomalies. Huang Xiao, Han Xiao (Technische Universität München) Kickoff: Anomaly Detection Challenges January 31, 2014 8 / 17

Process How to challenge 1. Team up + Task assignment Team up with max. 2 persons, and we assign a well designed data set to all the teams. 2. Do your homework Apply your own algorithms, e.g., statistics based, machine learning based, on the data set to find anomalies. 3. Upload your results Upload the results on our Kaggle competition platform (TbA) for evaluation (Accuracy, False positive/negative). Huang Xiao, Han Xiao (Technische Universität München) Kickoff: Anomaly Detection Challenges January 31, 2014 8 / 17

Process How to challenge 1. Team up + Task assignment Team up with max. 2 persons, and we assign a well designed data set to all the teams. 2. Do your homework Apply your own algorithms, e.g., statistics based, machine learning based, on the data set to find anomalies. 3. Upload your results Upload the results on our Kaggle competition platform (TbA) for evaluation (Accuracy, False positive/negative). 4. Report Present your work flow and results in class. Huang Xiao, Han Xiao (Technische Universität München) Kickoff: Anomaly Detection Challenges January 31, 2014 8 / 17

More information How to challenge Team work You are expected to work on data sets in a team with maximal two persons. Of course, to work alone is also acceptable. Huang Xiao, Han Xiao (Technische Universität München) Kickoff: Anomaly Detection Challenges January 31, 2014 9 / 17

How to challenge More information Team work You are expected to work on data sets in a team with maximal two persons. Of course, to work alone is also acceptable. Data sets All the data sets are real-world data in certain applied domain, e.g., network intrusion, credit card. Huang Xiao, Han Xiao (Technische Universität München) Kickoff: Anomaly Detection Challenges January 31, 2014 9 / 17

How to challenge More information Team work You are expected to work on data sets in a team with maximal two persons. Of course, to work alone is also acceptable. Data sets All the data sets are real-world data in certain applied domain, e.g., network intrusion, credit card. Methods Algorithms are not limited in any category, you can use any anomaly detection methods, if you think it is relevant. Huang Xiao, Han Xiao (Technische Universität München) Kickoff: Anomaly Detection Challenges January 31, 2014 9 / 17

How to challenge More information Team work You are expected to work on data sets in a team with maximal two persons. Of course, to work alone is also acceptable. Data sets All the data sets are real-world data in certain applied domain, e.g., network intrusion, credit card. Methods Algorithms are not limited in any category, you can use any anomaly detection methods, if you think it is relevant. Tools You can use any programming tools (frameworks) you like. We will give practical lectures in Matlab. Huang Xiao, Han Xiao (Technische Universität München) Kickoff: Anomaly Detection Challenges January 31, 2014 9 / 17

How to challenge More information Team work You are expected to work on data sets in a team with maximal two persons. Of course, to work alone is also acceptable. Data sets All the data sets are real-world data in certain applied domain, e.g., network intrusion, credit card. Methods Algorithms are not limited in any category, you can use any anomaly detection methods, if you think it is relevant. Tools You can use any programming tools (frameworks) you like. We will give practical lectures in Matlab. Kaggle Kaggle is an online competition platform, our page will be opened very soon. Huang Xiao, Han Xiao (Technische Universität München) Kickoff: Anomaly Detection Challenges January 31, 2014 9 / 17

How to challenge More information Team work You are expected to work on data sets in a team with maximal two persons. Of course, to work alone is also acceptable. Data sets All the data sets are real-world data in certain applied domain, e.g., network intrusion, credit card. Methods Algorithms are not limited in any category, you can use any anomaly detection methods, if you think it is relevant. Tools You can use any programming tools (frameworks) you like. We will give practical lectures in Matlab. Kaggle Kaggle is an online competition platform, our page will be opened very soon. Benchmarks Note that as a binary classification problem, your results will be evaluated for detection accuracy, false positive/negative. Huang Xiao, Han Xiao (Technische Universität München) Kickoff: Anomaly Detection Challenges January 31, 2014 9 / 17

How to challenge More information Team work You are expected to work on data sets in a team with maximal two persons. Of course, to work alone is also acceptable. Data sets All the data sets are real-world data in certain applied domain, e.g., network intrusion, credit card. Methods Algorithms are not limited in any category, you can use any anomaly detection methods, if you think it is relevant. Tools You can use any programming tools (frameworks) you like. We will give practical lectures in Matlab. Kaggle Kaggle is an online competition platform, our page will be opened very soon. Benchmarks Note that as a binary classification problem, your results will be evaluated for detection accuracy, false positive/negative. Report You will present your results after a 2-weeks work in 15 min and hand in a report in A4 within 2 pages for each task. Huang Xiao, Han Xiao (Technische Universität München) Kickoff: Anomaly Detection Challenges January 31, 2014 9 / 17

How to challenge You will also learn... During your work on the data set, we will also introduce... Classical machine learning algorithms in practice. Implement your own machine learning algorithms. Matlab tutorials in Machine learning. Schedule and topics are now available online. Huang Xiao, Han Xiao (Technische Universität München) Kickoff: Anomaly Detection Challenges January 31, 2014 10 / 17

How to challenge Possible data sets KDD99 Intrusion Detection data set German credit card fraud detection data set The Paper-Author data set containing incorrect paper-author assignments NASA disk defect data set containing faults on disks Crowded scenes data sets consisting videos of a crowded pedestrian walkway and so on... Huang Xiao, Han Xiao (Technische Universität München) Kickoff: Anomaly Detection Challenges January 31, 2014 11 / 17

How to challenge Possible data sets KDD99 Intrusion Detection data set German credit card fraud detection data set The Paper-Author data set containing incorrect paper-author assignments NASA disk defect data set containing faults on disks Crowded scenes data sets consisting videos of a crowded pedestrian walkway and so on... Other suggestions are warmly welcome! uang Xiao, Han Xiao (Technische Universität München) Kickoff: Anomaly Detection Challenges January 31, 2014 11 / 17

Overview Evaluation 1 Motivation 2 How to challenge 3 Evaluation 4 References uang Xiao, Han Xiao (Technische Universität München) Kickoff: Anomaly Detection Challenges January 31, 2014 12 / 17

Evaluation Evaluation of your credits There is no oral or written exams for practical course Your credits are evaluated as follows Credits C=0.3 T + 0.4 R + 0.2 P + 0.1 B, where T Talk for the results R Report for the results P Performance in class B Benchmarks (ranking) on Kaggle uang Xiao, Han Xiao (Technische Universität München) Kickoff: Anomaly Detection Challenges January 31, 2014 13 / 17

Call for Data Sets Evaluation We announce each data set and learning task in class. But... If you have any interesting data sets for anomaly detection, they are extremely welcome! Simply contact us without hesitation. Huang Xiao, Han Xiao (Technische Universität München) Kickoff: Anomaly Detection Challenges January 31, 2014 14 / 17

Evaluation Miscellaneous Register in Kaggle.com to be able to upload your results. We encourage using latex for the report. Bring your own laptop, possibly with Matlab Licence 1. Any feedback for the course is welcome. Teams are supposed to work independently. 1 You can inquire a student licence from Matlab RGB: https://matlab.rbg.tum.de uang Xiao, Han Xiao (Technische Universität München) Kickoff: Anomaly Detection Challenges January 31, 2014 15 / 17

Overview References 1 Motivation 2 How to challenge 3 Evaluation 4 References uang Xiao, Han Xiao (Technische Universität München) Kickoff: Anomaly Detection Challenges January 31, 2014 16 / 17

Reading list References Varun Chandola, et al. Anomaly detection: A survey. Journal ACM Computing Surveys (CSUR), July 2009. Nico Görnitz, et al. Toward Supervised Anomaly Detection. Journal of Articial Intelligence Research, Feb. 2013. Victoria Hodge, et al. A Survey of Outlier Detection Methodologies. Journal Artificial Intelligence Review, Oct. 2004. Simon Rogers, et al. A First Course in Machine Learning. CRC Press, Inc., 2012. Chris Bishop. Pattern recognition and Machine Learning. Springer, 2006. Huang Xiao, Han Xiao (Technische Universität München) Kickoff: Anomaly Detection Challenges January 31, 2014 17 / 17