Kickoff: Anomaly Detection Challenges A Practical Course in SS2014 Huang Xiao Han Xiao Chair of IT Security (I20) Department of Informatics Technische Universität München January 31, 2014 Huang Xiao, Han Xiao (Technische Universität München) Kickoff: Anomaly Detection Challenges January 31, 2014 1 / 17
Overview Motivation 1 Motivation 2 How to challenge 3 Evaluation 4 References uang Xiao, Han Xiao (Technische Universität München) Kickoff: Anomaly Detection Challenges January 31, 2014 2 / 17
Motivation What is Anomaly Detection Definition Anomaly Detection is a process of discovering patterns in data which do not comply with their expected behavior. Similar terms can also be referred as outlier detection, novelty detection and so on. uang Xiao, Han Xiao (Technische Universität München) Kickoff: Anomaly Detection Challenges January 31, 2014 3 / 17
Motivation What is Anomaly Detection Definition Anomaly Detection is a process of discovering patterns in data which do not comply with their expected behavior. Similar terms can also be referred as outlier detection, novelty detection and so on. Anomalies are... Rare Harmful Confusing *NOT* noises Huang Xiao, Han Xiao (Technische Universität München) Kickoff: Anomaly Detection Challenges January 31, 2014 3 / 17
Motivation Curse of Anomalies Anomalous behaviors intend to compromise a system or service by maximizing certain interest. Fraudulent transactions of credit cards caused tremendous financial lost per year. Suspicious MRI images are possibly indicative of malicious existence of tumor. Anomalous network traffic measurement during a certain period might indicate a network hacking-through Unusual noises in motorcycle may also refer to some damage of the engine which could be fatal. Huang Xiao, Han Xiao (Technische Universität München) Kickoff: Anomaly Detection Challenges January 31, 2014 4 / 17
Motivation Curse of Anomalies Anomalous behaviors intend to compromise a system or service by maximizing certain interest. Fraudulent transactions of credit cards caused tremendous financial lost per year. Suspicious MRI images are possibly indicative of malicious existence of tumor. Anomalous network traffic measurement during a certain period might indicate a network hacking-through Unusual noises in motorcycle may also refer to some damage of the engine which could be fatal. We need to do something with the anomalies. uang Xiao, Han Xiao (Technische Universität München) Kickoff: Anomaly Detection Challenges January 31, 2014 4 / 17
Motivation General Course Information Type Practical Course (Praktikum) Credits 6 SWS / 10,0 ECTS-Credits Time Di, 14:00 to 15:30 Uhr Start-End Start from 08.04.2014, ends at 08.07.2014 Where Lab room 01.05.013 Advisors Huang Xiao & Han Xiao Language English Required Registered Master or Diplom of Informatik at TUM Home page http://ml.sec.in.tum.de/adcg/ Website of Chair http://www.sec.in.tum.de/ Huang Xiao, Han Xiao (Technische Universität München) Kickoff: Anomaly Detection Challenges January 31, 2014 5 / 17
Overview How to challenge 1 Motivation 2 How to challenge 3 Evaluation 4 References uang Xiao, Han Xiao (Technische Universität München) Kickoff: Anomaly Detection Challenges January 31, 2014 6 / 17
How to challenge Objective We aim at... providing challengers a set of learning tasks, in which they are assigned with a certain data set including some anomalies. In the learning tasks, challengers are about to detect those anomalies using their own proposed methods. That is Anomaly detecion in teams Assigned data sets Apply own algorithms Benchmarks on data sets Ranking of detection performance uang Xiao, Han Xiao (Technische Universität München) Kickoff: Anomaly Detection Challenges January 31, 2014 7 / 17
Process How to challenge Huang Xiao, Han Xiao (Technische Universität München) Kickoff: Anomaly Detection Challenges January 31, 2014 8 / 17
Process How to challenge 1. Team up + Task assignment Team up with max. 2 persons, and we assign a well designed data set to all the teams. Huang Xiao, Han Xiao (Technische Universität München) Kickoff: Anomaly Detection Challenges January 31, 2014 8 / 17
Process How to challenge 1. Team up + Task assignment Team up with max. 2 persons, and we assign a well designed data set to all the teams. 2. Do your homework Apply your own algorithms, e.g., statistics based, machine learning based, on the data set to find anomalies. Huang Xiao, Han Xiao (Technische Universität München) Kickoff: Anomaly Detection Challenges January 31, 2014 8 / 17
Process How to challenge 1. Team up + Task assignment Team up with max. 2 persons, and we assign a well designed data set to all the teams. 2. Do your homework Apply your own algorithms, e.g., statistics based, machine learning based, on the data set to find anomalies. 3. Upload your results Upload the results on our Kaggle competition platform (TbA) for evaluation (Accuracy, False positive/negative). Huang Xiao, Han Xiao (Technische Universität München) Kickoff: Anomaly Detection Challenges January 31, 2014 8 / 17
Process How to challenge 1. Team up + Task assignment Team up with max. 2 persons, and we assign a well designed data set to all the teams. 2. Do your homework Apply your own algorithms, e.g., statistics based, machine learning based, on the data set to find anomalies. 3. Upload your results Upload the results on our Kaggle competition platform (TbA) for evaluation (Accuracy, False positive/negative). 4. Report Present your work flow and results in class. Huang Xiao, Han Xiao (Technische Universität München) Kickoff: Anomaly Detection Challenges January 31, 2014 8 / 17
More information How to challenge Team work You are expected to work on data sets in a team with maximal two persons. Of course, to work alone is also acceptable. Huang Xiao, Han Xiao (Technische Universität München) Kickoff: Anomaly Detection Challenges January 31, 2014 9 / 17
How to challenge More information Team work You are expected to work on data sets in a team with maximal two persons. Of course, to work alone is also acceptable. Data sets All the data sets are real-world data in certain applied domain, e.g., network intrusion, credit card. Huang Xiao, Han Xiao (Technische Universität München) Kickoff: Anomaly Detection Challenges January 31, 2014 9 / 17
How to challenge More information Team work You are expected to work on data sets in a team with maximal two persons. Of course, to work alone is also acceptable. Data sets All the data sets are real-world data in certain applied domain, e.g., network intrusion, credit card. Methods Algorithms are not limited in any category, you can use any anomaly detection methods, if you think it is relevant. Huang Xiao, Han Xiao (Technische Universität München) Kickoff: Anomaly Detection Challenges January 31, 2014 9 / 17
How to challenge More information Team work You are expected to work on data sets in a team with maximal two persons. Of course, to work alone is also acceptable. Data sets All the data sets are real-world data in certain applied domain, e.g., network intrusion, credit card. Methods Algorithms are not limited in any category, you can use any anomaly detection methods, if you think it is relevant. Tools You can use any programming tools (frameworks) you like. We will give practical lectures in Matlab. Huang Xiao, Han Xiao (Technische Universität München) Kickoff: Anomaly Detection Challenges January 31, 2014 9 / 17
How to challenge More information Team work You are expected to work on data sets in a team with maximal two persons. Of course, to work alone is also acceptable. Data sets All the data sets are real-world data in certain applied domain, e.g., network intrusion, credit card. Methods Algorithms are not limited in any category, you can use any anomaly detection methods, if you think it is relevant. Tools You can use any programming tools (frameworks) you like. We will give practical lectures in Matlab. Kaggle Kaggle is an online competition platform, our page will be opened very soon. Huang Xiao, Han Xiao (Technische Universität München) Kickoff: Anomaly Detection Challenges January 31, 2014 9 / 17
How to challenge More information Team work You are expected to work on data sets in a team with maximal two persons. Of course, to work alone is also acceptable. Data sets All the data sets are real-world data in certain applied domain, e.g., network intrusion, credit card. Methods Algorithms are not limited in any category, you can use any anomaly detection methods, if you think it is relevant. Tools You can use any programming tools (frameworks) you like. We will give practical lectures in Matlab. Kaggle Kaggle is an online competition platform, our page will be opened very soon. Benchmarks Note that as a binary classification problem, your results will be evaluated for detection accuracy, false positive/negative. Huang Xiao, Han Xiao (Technische Universität München) Kickoff: Anomaly Detection Challenges January 31, 2014 9 / 17
How to challenge More information Team work You are expected to work on data sets in a team with maximal two persons. Of course, to work alone is also acceptable. Data sets All the data sets are real-world data in certain applied domain, e.g., network intrusion, credit card. Methods Algorithms are not limited in any category, you can use any anomaly detection methods, if you think it is relevant. Tools You can use any programming tools (frameworks) you like. We will give practical lectures in Matlab. Kaggle Kaggle is an online competition platform, our page will be opened very soon. Benchmarks Note that as a binary classification problem, your results will be evaluated for detection accuracy, false positive/negative. Report You will present your results after a 2-weeks work in 15 min and hand in a report in A4 within 2 pages for each task. Huang Xiao, Han Xiao (Technische Universität München) Kickoff: Anomaly Detection Challenges January 31, 2014 9 / 17
How to challenge You will also learn... During your work on the data set, we will also introduce... Classical machine learning algorithms in practice. Implement your own machine learning algorithms. Matlab tutorials in Machine learning. Schedule and topics are now available online. Huang Xiao, Han Xiao (Technische Universität München) Kickoff: Anomaly Detection Challenges January 31, 2014 10 / 17
How to challenge Possible data sets KDD99 Intrusion Detection data set German credit card fraud detection data set The Paper-Author data set containing incorrect paper-author assignments NASA disk defect data set containing faults on disks Crowded scenes data sets consisting videos of a crowded pedestrian walkway and so on... Huang Xiao, Han Xiao (Technische Universität München) Kickoff: Anomaly Detection Challenges January 31, 2014 11 / 17
How to challenge Possible data sets KDD99 Intrusion Detection data set German credit card fraud detection data set The Paper-Author data set containing incorrect paper-author assignments NASA disk defect data set containing faults on disks Crowded scenes data sets consisting videos of a crowded pedestrian walkway and so on... Other suggestions are warmly welcome! uang Xiao, Han Xiao (Technische Universität München) Kickoff: Anomaly Detection Challenges January 31, 2014 11 / 17
Overview Evaluation 1 Motivation 2 How to challenge 3 Evaluation 4 References uang Xiao, Han Xiao (Technische Universität München) Kickoff: Anomaly Detection Challenges January 31, 2014 12 / 17
Evaluation Evaluation of your credits There is no oral or written exams for practical course Your credits are evaluated as follows Credits C=0.3 T + 0.4 R + 0.2 P + 0.1 B, where T Talk for the results R Report for the results P Performance in class B Benchmarks (ranking) on Kaggle uang Xiao, Han Xiao (Technische Universität München) Kickoff: Anomaly Detection Challenges January 31, 2014 13 / 17
Call for Data Sets Evaluation We announce each data set and learning task in class. But... If you have any interesting data sets for anomaly detection, they are extremely welcome! Simply contact us without hesitation. Huang Xiao, Han Xiao (Technische Universität München) Kickoff: Anomaly Detection Challenges January 31, 2014 14 / 17
Evaluation Miscellaneous Register in Kaggle.com to be able to upload your results. We encourage using latex for the report. Bring your own laptop, possibly with Matlab Licence 1. Any feedback for the course is welcome. Teams are supposed to work independently. 1 You can inquire a student licence from Matlab RGB: https://matlab.rbg.tum.de uang Xiao, Han Xiao (Technische Universität München) Kickoff: Anomaly Detection Challenges January 31, 2014 15 / 17
Overview References 1 Motivation 2 How to challenge 3 Evaluation 4 References uang Xiao, Han Xiao (Technische Universität München) Kickoff: Anomaly Detection Challenges January 31, 2014 16 / 17
Reading list References Varun Chandola, et al. Anomaly detection: A survey. Journal ACM Computing Surveys (CSUR), July 2009. Nico Görnitz, et al. Toward Supervised Anomaly Detection. Journal of Articial Intelligence Research, Feb. 2013. Victoria Hodge, et al. A Survey of Outlier Detection Methodologies. Journal Artificial Intelligence Review, Oct. 2004. Simon Rogers, et al. A First Course in Machine Learning. CRC Press, Inc., 2012. Chris Bishop. Pattern recognition and Machine Learning. Springer, 2006. Huang Xiao, Han Xiao (Technische Universität München) Kickoff: Anomaly Detection Challenges January 31, 2014 17 / 17