Anomaly detection. Problem motivation. Machine Learning

Similar documents
Characterizing Task Usage Shapes in Google s Compute Clusters

BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES

Plastic Card Fraud Detection using Peer Group analysis

: Introduction to Machine Learning Dr. Rita Osadchy

E-commerce Transaction Anomaly Classification

Machine Learning Capacity and Performance Analysis and R

MACHINE LEARNING IN HIGH ENERGY PHYSICS

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

Intrusion Detection: Game Theory, Stochastic Processes and Data Mining

Azure Machine Learning, SQL Data Mining and R

Dan French Founder & CEO, Consider Solutions

Using multiple models: Bagging, Boosting, Ensembles, Forests

Towards better accuracy for Spam predictions

Conclusions and Future Directions

Practical Data Science with Azure Machine Learning, SQL Data Mining, and R

Detecting Network Anomalies. Anant Shah

Obtaining Value from Big Data

Statistical Analysis. NBAF-B Metabolomics Masterclass. Mark Viant

Secure Because Math: Understanding ML- based Security Products (#SecureBecauseMath)

Bayesian networks - Time-series models - Apache Spark & Scala

E6893 Big Data Analytics: Yelp Fake Review Detection

Knowledge-based systems and the need for learning

Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data

STA 4273H: Statistical Machine Learning

Kickoff: Anomaly Detection Challenges

Anomaly Detection for Univariate Time-Series Data

Lecture 3: Linear methods for classification

1 Maximum likelihood estimation

Cross Validation. Dr. Thomas Jensen Expedia.com

Data Mining + Business Intelligence. Integration, Design and Implementation

How To Cluster

Location matters. 3 techniques to incorporate geo-spatial effects in one's predictive model

Data, Measurements, Features

Role of Anomaly IDS in Network

Performance Metrics for Graph Mining Tasks

MACHINE LEARNING & INTRUSION DETECTION: HYPE OR REALITY?

Data Mining - Evaluation of Classifiers

Semi-Supervised and Unsupervised Machine Learning. Novel Strategies

Machine Learning using MapReduce

Introduction to Machine Learning Lecture 1. Mehryar Mohri Courant Institute and Google Research

Anomaly Detection and Predictive Maintenance

A Survey on Outlier Detection Techniques for Credit Card Fraud Detection

Supervised Feature Selection & Unsupervised Dimensionality Reduction

Big Data Analytics. An Introduction. Oliver Fuchsberger University of Paderborn 2014

Review Jeopardy. Blue vs. Orange. Review Jeopardy

Why is Internal Audit so Hard?

Flash Research Assignment: Virtualization and Cloud Computing

Introduction to Longitudinal Data Analysis

Data Mining III: Numeric Estimation

T : Classification as Spam or Ham using Naive Bayes Classifier. Santosh Tirunagari :

Data Mining with Weka

High Productivity Data Processing Analytics Methods with Applications

Feature Selection with Monte-Carlo Tree Search

Chapter 12 Discovering New Knowledge Data Mining

Detecting Anomaly IDS in Network using Bayesian Network

OPERA SOLUTIONS CAPABILITIES. ACH and Wire Fraud: advanced anomaly detection to find and stop costly attacks

System Specification. Author: CMU Team

Scalable Machine Learning - or what to do with all that Big Data infrastructure

OUTLIER ANALYSIS. Data Mining 1

Simple Predictive Analytics Curtis Seare

Cross-validation for detecting and preventing overfitting

Analysis of kiva.com Microlending Service! Hoda Eydgahi Julia Ma Andy Bardagjy December 9, 2010 MAS.622j

Audit Data Analytics. Bob Dohrer, IAASB Member and Working Group Chair. Miklos Vasarhelyi Phillip McCollough

Performance Analysis of Naive Bayes and J48 Classification Algorithm for Data Classification

CYBER SCIENCE 2015 AN ANALYSIS OF NETWORK TRAFFIC CLASSIFICATION FOR BOTNET DETECTION

Using Predictive Analytics to Detect Contract Fraud, Waste, and Abuse Case Study from U.S. Postal Service OIG

The Data Mining Process

Data Mining: An Overview. David Madigan

W6.B.1. FAQs CS535 BIG DATA W6.B If the distance of the point is additionally less than the tight distance T 2, remove it from the original set

Evaluation & Validation: Credibility: Evaluating what has been learned

Final Project Report

T Non-discriminatory Machine Learning

Mimicking human fake review detection on Trustpilot

How to use Big Data in Industry 4.0 implementations. LAURI ILISON, PhD Head of Big Data and Machine Learning

Challenges of Cloud Scale Natural Language Processing

FRAUD DETECTION AND PREVENTION: A DATA ANALYTICS APPROACH BY SESHIKA FERNANDO TECHNICAL LEAD, WSO2

Data Mining: STATISTICA

How To Identify A Churner

Lecture 8: Signal Detection and Noise Assumption

Juniper Networks Management Pack Documentation

Ensemble Methods. Knowledge Discovery and Data Mining 2 (VU) ( ) Roman Kern. KTI, TU Graz

Performance Metrics. number of mistakes total number of observations. err = p.1/1

Bayesian Spam Filtering

Performance Measures for Machine Learning

Adaptive Anomaly Detection for Network Security

BOOSTED REGRESSION TREES: A MODERN WAY TO ENHANCE ACTUARIAL MODELLING

Supervised Learning (Big Data Analytics)

Application of Predictive Analytics for Better Alignment of Business and IT

Characterizing Task Usage Shapes in Google s Compute Clusters

System for Denial-of-Service Attack Detection Based On Triangle Area Generation

2013 Satisfaction Survey. How are we doing? Easier to Read Version

Transcription:

Anomaly detection Problem motivation Machine Learning

Anomaly detection example Aircraft engine features: = heat generated = vibration intensity Dataset: New engine: (vibration) (heat)

Density estimation Dataset: Is anomalous? (vibration) (heat)

Anomaly detection example Fraud detection: = features of user s activities Model from data. Identify unusual users by checking which have Manufacturing Monitoring computers in a data center. = features of machine = memory use, = number of disk accesses/sec, = CPU load, = CPU load/network traffic.

Anomaly detection Gaussian distribution Machine Learning

Gaussian (Normal) distribution Say. If is a distributed Gaussian with mean, variance.

Gaussian distribution example

Parameter estimation Dataset:

Anomaly detection Algorithm Machine Learning

Density estimation Training set: Each example is

Anomaly detection algorithm 1. Choose features that you think might be indicative of anomalous examples. 2. Fit parameters 3. Given new example, compute : Anomaly if

Anomaly detection example

Anomaly detection Developing and evaluating an anomaly detection system Machine Learning

The importance of real number evaluation When developing a learning algorithm (choosing features, etc.), making decisions is much easier if we have a way of evaluating our learning algorithm. Assume we have some labeled data, of anomalous and nonanomalous examples. ( if normal, if anomalous). Training set: anomalous) Cross validation set: Test set: (assume normal examples/not

Aircraft engines motivating example 10000 good (normal) engines 20 flawed engines (anomalous) Training set: 6000 good engines CV: 2000 good engines ( ), 10 anomalous ( ) Test: 2000 good engines ( ), 10 anomalous ( ) Alternative: Training set: 6000 good engines CV: 4000 good engines ( ), 10 anomalous ( ) Test: 4000 good engines ( ), 10 anomalous ( )

Algorithm evaluation Fit model on training set On a cross validation/test example, predict Possible evaluation metrics: True positive, false positive, false negative, true negative Precision/Recall F 1 score Can also use cross validation set to choose parameter

Anomaly detection Anomaly detection vs. supervised learning Machine Learning

Anomaly detection Very small number of positive examples ( ). (0 20 is common). Large number of negative ( ) examples. Many different types of anomalies. Hard for any algorithm to learn from positive examples what the anomalies look like; future anomalies may look nothing like any of the anomalous examples we ve seen so far. vs. Supervised learning Large number of positive and negative examples. Enough positive examples for algorithm to get a sense of what positive examples are like, future positive examples likely to be similar to ones in training set.

Anomaly detection Fraud detection Manufacturing (e.g. aircraft engines) Monitoring machines in a data center vs. Supervised learning Email spam classification Weather prediction (sunny/rainy/etc). Cancer classification

Anomaly detection Choosing what features to use Machine Learning

Non gaussian features

Error analysis for anomaly detection Want large for normal examples. small for anomalous examples. Most common problem: is comparable (say, both large) for normal and anomalous examples

Monitoring computers in a data center Choose features that might take on unusually large or small values in the event of an anomaly. = memory use of computer = number of disk accesses/sec = CPU load = network traffic

Anomaly detection Multivariate Gaussian distribution Machine Learning

Motivating example: Monitoring machines in a data center (Memory Use) (CPU Load) (CPU Load) (Memory Use)

Multivariate Gaussian (Normal) distribution. Don t model etc. separately. Model all in one go. Parameters: (covariance matrix)

Multivariate Gaussian (Normal) examples

Multivariate Gaussian (Normal) examples

Multivariate Gaussian (Normal) examples

Multivariate Gaussian (Normal) examples

Multivariate Gaussian (Normal) examples

Multivariate Gaussian (Normal) examples

Anomaly detection Anomaly detection using the multivariate Gaussian distribution Machine Learning

Multivariate Gaussian (Normal) distribution Parameters Parameter fitting: Given training set

Anomaly detection with the multivariate Gaussian 1. Fit model by setting 2. Given a new example, compute Flag an anomaly if

Relationship to original model Original model: Corresponds to multivariate Gaussian where

Original model vs. Multivariate Gaussian Manually create features to capture anomalies where take unusual combinations of values. Computationally cheaper (alternatively, scales better to large OK ) even if (training set size) is small Automatically captures correlations between features Computationally more expensive Must have, or else is non invertible.