What s Behind Big Data and Behavorial Analytics

Similar documents
The Foundations of Big Data Behavioral Analytics

Stay ahead of insiderthreats with predictive,intelligent security

WHITE PAPER: THREAT INTELLIGENCE RANKING

Big Data and Security: At the Edge of Prediction

Take the Red Pill: Becoming One with Your Computing Environment using Security Intelligence

IBM QRadar Security Intelligence April 2013

Hunting for the Undefined Threat: Advanced Analytics & Visualization

Niara Security Analytics. Overview. Automatically detect attacks on the inside using machine learning

Teradata and Protegrity High-Value Protection for High-Value Data

After the Attack: RSA's Security Operations Transformed

PALANTIR CYBER An End-to-End Cyber Intelligence Platform for Analysis & Knowledge Management

Evolution Of Cyber Threats & Defense Approaches

Endpoint Threat Detection without the Pain

IBM Data Security Services for endpoint data protection endpoint data loss prevention solution

Niara Security Intelligence. Overview. Threat Discovery and Incident Investigation Reimagined

Machine Data Analytics with Sumo Logic

IBM Data Security Services for endpoint data protection endpoint data loss prevention solution

SAS Fraud Framework for Banking

Cyber Watch. Written by Peter Buxbaum

How To Manage Security On A Networked Computer System

QRadar SIEM and Zscaler Nanolog Streaming Service

Ecom Infotech. Page 1 of 6

Instilling Confidence in Security and Risk Operations with Behavioral Analytics and Contextualization

Protect the data that drives our customers business. Data Security. Imperva s mission is simple:

Data Science Transforming Security Operations

THE 2014 THREAT DETECTION CHECKLIST. Six ways to tell a criminal from a customer.

LOG INTELLIGENCE FOR SECURITY AND COMPLIANCE

Cyber Security Metrics Dashboards & Analytics

LOG AND EVENT MANAGEMENT FOR SECURITY AND COMPLIANCE

Comprehensive Advanced Threat Defense

WAN security threat landscape and best mitigation practices. Rex Stover Vice President, Americas, Enterprise & ICP Sales

Fight fire with fire when protecting sensitive data

SANS Top 20 Critical Controls for Effective Cyber Defense

HP ArcSight User Behavior Analytics

End-user Security Analytics Strengthens Protection with ArcSight

NIST CYBERSECURITY FRAMEWORK COMPLIANCE WITH OBSERVEIT

SIEM is only as good as the data it consumes

the challenge our mission our advisors

Worldwide Security and Vulnerability Management Forecast and 2008 Vendor Shares

WHITE PAPER SPLUNK SOFTWARE AS A SIEM

Hillstone T-Series Intelligent Next-Generation Firewall Whitepaper: Abnormal Behavior Analysis

Augmented Search for Web Applications. New frontier in big log data analysis and application intelligence

POWERFUL SOFTWARE. FIGHTING HIGH CONSEQUENCE CYBER CRIME. KEY SOLUTION HIGHLIGHTS

Hillstone Intelligent Next Generation Firewall

Using Big Data to Align IT Security with Business Risk Mark Seward, Senior Director, Security and Compliance

How To Create Situational Awareness

Effective Methods to Detect Current Security Threats

Fighting Advanced Threats

Overcoming Five Critical Cybersecurity Gaps

Palo Alto Networks and Splunk: Combining Next-generation Solutions to Defeat Advanced Threats

Beyond passwords: Protect the mobile enterprise with smarter security solutions

The webinar will begin shortly

TRIPWIRE NERC SOLUTION SUITE

Data Loss Prevention with Platfora Big Data Analytics

SecureVue Product Brochure

Effective Methods to Detect Current Security Threats

Tax Fraud in Increasing

defending against advanced persistent threats: strategies for a new era of attacks agility made possible

WHITE PAPER Moving Beyond the FFIEC Guidelines

Into the cybersecurity breach

Enterprise Organizations Need Contextual- security Analytics Date: October 2014 Author: Jon Oltsik, Senior Principal Analyst

Strengthen security with intelligent identity and access management

On-Premises DDoS Mitigation for the Enterprise

GOOD GUYS VS BAD GUYS: USING BIG DATA TO COUNTERACT ADVANCED THREATS. Joe Goldberg. Splunk. Session ID: SPO-W09 Session Classification: Intermediate

DYNAMIC DNS: DATA EXFILTRATION

Risk Mitigation Strategies: Lessons Learned from Actual Insider Attacks

Why Device Fingerprinting Provides Better Network Security than IP Blocking. How to transform the economics of hacking in your favor

Things To Do After You ve Been Hacked

Protect Your Connected Business Systems by Identifying and Analyzing Threats

Big Data Analytics. An Introduction. Oliver Fuchsberger University of Paderborn 2014

HIGH-RISK USER MONITORING

Cautela Labs Cloud Agile. Secured. Threat Management Security Solutions at Work

Executive Summary 3. Snowden and Retail Breaches Influencing Security Strategies 3. Attackers are on the Inside Protect Your Privileges 3

DEFENSE THROUGHOUT THE VULNERABILITY LIFE CYCLE WITH ALERT LOGIC THREAT AND LOG MANAGER

CyberArk Privileged Threat Analytics. Solution Brief

Combating a new generation of cybercriminal with in-depth security monitoring. 1 st Advanced Data Analysis Security Operation Center

First Line of Defense

Unified Cyber Security Monitoring and Management Framework By Vijay Bharti Happiest Minds, Security Services Practice

A Database Security Management White Paper: Securing the Information Business Relies On. November 2004

A Love Affair: Cyber Security, Big-data and Risk

IBM SECURITY QRADAR INCIDENT FORENSICS

Enabling Security Operations with RSA envision. August, 2009

ENABLING FAST RESPONSES THREAT MONITORING

The Cloud App Visibility Blindspot

Preparing for a Cyber Attack PROTECT YOUR PEOPLE AND INFORMATION WITH SYMANTEC SECURITY SOLUTIONS

Transcription:

STEPHAN JOU, CTO ISSA TORONTO What s Behind Big Data and Behavorial Analytics

Hey. I m Stephan Jou CTO at Interset Previously: IBM s Business AnalyBcs CTO Office Big data analybcs, visualizabon, cloud, predicbve analybcs, data mining, neural networks, mobile, dashboarding and semanbc search M.Sc. in ComputaBonal Neuroscience and Biomedical Engineering, and a dual B.Sc. in Computer Science and Human Physiology, all from the University of Toronto Email: sjou@interset.com TwiTer: @eeksock 2

Catching Bad Guys With Math Threat Detection (Insider and Compromised Machine Attack) Through the Science of Behavioral Analytics 3

Who Is This? Lessons: There were limited systems in place and we sbll do not know all that he took His acbons were highly anomalous - Volumes of data - Access to improper accounts - Usage of USB storage devices There was plenty of evidence and Bme if only it was visible! 4

Who Are These Two? Lessons: Disgrunted insiders employees can be at risk What were the anomalies? Copied 16,000 documents within five days of receiving severance There was plenty of evidence and Bme if only it was visible! 5

And This Guy? There was plenty of evidence and Bme if only it was visible! Lessons: Most atacks are from users/idenbbes with proper access ATacker stayed under the radar for years Third parbes (US Intelligence) most ocen uncovers the atack What were the anomalies? Accessing data not related to his job Moving data in ways that same role users were not over Bme Money problems 6

And these guys? if we do this right, we will make a million dollars each we could have already sold them for Bitcoins which would have been untraceable if we did it right. It could have already been easily an easy 50 grand. Lessons: Make sure your partners are secure Hacked (SQL InjecBon) a partner with a weak network Stole user names and passwords IdenBBes & machines are enbbes They acted in highly anomalous ways Moved large amounts of data Moved data to exfiltrabon points At four companies and the US Army! There was plenty of evidence and Bme if only it was visible!

How Do You Catch the Authorized User? 75% of material loss via insiders with approved access 70% of IP thec cases, insiders steal informabon within 30 days of announcing their resignabons 62% of employees believe it acceptable to transfer work documents to personal devices or cloud- based file sharing services, even if a company police prohibits it 60% of employees believe informabon they had been involved in developing is theirs regardless of the IP protecbon policy of the company 51% of employees say their company does not strictly enforce policies, so feel it more than OK to take corporate data. 20% of loss involved collaborabon with one or more employees Source: Symantec & 2011 Cyber Watch Survey, Carnegie Mellon University CERT Program 8

Enterprise Where s Bad Waldo 2014 Interset, a FileTrek Company

Enterprise Where s Bad Waldo 2014 Interset, a FileTrek Company

Kung Fu Move #1: Big Data Source: OliverMunday.com 11

The Four V s of Big Data (Sorry) Transactional Machine Social Reputation Volume Velocity Variety Veracity 12

Kung Fu Move #2: Math New Methods Traditional New Data Adaptive Analysis Continual Analysis Optimization under Uncertainty Optimization Predictive Modeling Simulation Forecasting Alerts Query/Drill Down Ad hoc Reporting Standard Reporting Entity Resolution Relationship, Feature Extraction Annotation and Tokenization Responding to context Responding to local change/feedback Quantifying or mitigating risk Decision complexity, solution speed Causality, probabilistic, confidence levels High fidelity, games, data farming Larger data sets, nonlinear regression Rules/triggers, context sensitive, complex events In memory data, fuzzy search, geo spatial Query by example, user defined reports Real time, visualizations, user interaction People, roles, locations, things Rules, semantic inferencing, matching Automated, crowd sourced Source: Competing on Analytics, Davenport and Harris, 2007 13

Venn Diagram of Data Science Hacking meaning computer science skills The problem if you chose the wrong math you will have false posibves and an ineffecbve systems Source: Drew Conway, http://drewconway.com/zia/2013/3/26/thedata-science-venn-diagram

Standard Thresholds Approach A Pattern for Increased Monitoring for Intellectual Property Theft by Departing Insiders, Andrew Moore et al., Carnegie Mellon, 2011

The Threshold Approach Challenge

The Threshold Approach Challenge

The Threshold Approach Challenge

Behavioral Analytics A simple example Edward Snowden was an contractor, sysadmin with privileged access User The volume of copying is large, compared to Snowden s past 30 days, and compared to other analysts Ac8vity Edward Snowden is copying an unusually large number of sensibve files to an external USB drive. These files have a high risk and importance value Asset USB drives are marked as high risk channels Method 19

Use Appropriate Math to Assemble the Data & ( '( R behavior = P(event y) w y AcBvity w u u U User File Method ) 2 i R u[i] + w f 2 j R f [ j] + w m 2 k R m[k] + f F m M * + w u + w f + w m Risk scores are percentages between 0% (no risk) and 100% (extreme risk) P(event y) is probability that the behavior occurred, either observed or predicted Aggregate risk values combine risks associated with the activity, people, assets and end points Model based on Expected Utility Theory and standard risk model (Risk = Probability * Impact) Mathematical weighting is used to tune and train model for specific activities, people, assets and end points on a per-behavior pattern basis 20

Important Questions Who or what is behaving abnormally? Who is stealing my stuff? Where is my important, at risk stuff? Who is going to leave the company? 21

Some Simple Anomaly Models Who or what is behaving abnormally? Who is going to steal my stuff? Person Name is accessing informabon during unusual working hours. Person Name accessed a storage volume, path, an unusually large number of Bmes Person Name accessed an important file type an unusually large number of Bmes Riskiest Users Person Name accessed an abnormally large amount of data. Person Name performed an abnormally large number of file exits. Where is my important, at risk stuff? Who is going to leave the company? Riskiest Files 22

More Sophisticated Anomaly Models Who or what is behaving abnormally? Person Name is using an unexpected file, filename. Person Name is touching an unexpected set of files. Person Name is consistently accessing higher amounts of data than similar users. Person Name is consistently accessing an important file type more than similar users. Person Name is accessing informabon during different working Bmes compared to similar users. An applicabon accessed an unexpected file type. Who is going to steal my stuff? Person Name has accessed an unusual amount of total file value. Person Name is consistently performing more file exits than similar users. Person Name's amount of file exits varies more than similar users. Person Name has replicated a large amount of source code Where is my important, at risk stuff? Who is going to leave the company? Highest at- risk machines, file shares, and source code repositories The file, Filename, is highly valuable compared to similar files. The following source code projects are most at- risk. Similar users visualizabon Similar files visualizabon Similar machines visualizabon Person Name is hoarding an unusual amount of source code. Person Name has been accessing unexpected source code repositories Person Name is engaging in job search acbvibes. The proporbon of Bme spent by Person Name on non- work acbvibes has changed. Person Name has emailed themselves. 23

Computing Probability of an Anomalous Event Each term in the aggregate behavior risk equabon has analybcs behind it Highly anomalous acbvibes, compared to baseline, should result in a high value How to compute the probability of an anomalous event? & ( '( R behavior = P(event y) w y w u u U ) 2 i R u[i] + w f 2 j R f [ j] + w m 2 k R m[k] + f F m M * + w u + w f + w m 24

Model: Unusual volumes Computes probability that a value in a given hour is anomalous - Bayesian approach Explicitly models both normal and abnormal distribubons - Gaussian, Gamma EsBmators for both normal and abnormal based on observabon

Example: Modeling unusual times Monitor, for each user, start Bmes of when a file or window is brought into focus AcBve Bmes used as input into Gaussian kernel density esbmators Times that contain 95% of acbvity deemed to be normal P(y is bad) at a given Bme is rabo of expected acbvity to 95% acbvity line 26

Model: Unusual Working Days User 1 Regularly works six days a week (takes Sundays off) Slight dip during lunches User 2 Works five days a week ParBcularly acbve on Thursdays 27

Model: Unusual Working Hours User 1 Starts work fairly early in morning Early lunch break SomeBmes works past midnight User 2 Doesn t work as long hours as User 1 9 to 5 er Has occasionally worked a litle bit acer 8pm 28

Model: Clustering Unusual Entities Clusters are created based on observed behaviors of a target set of enbbes - Users, Machines, Assets Clusters are created for like behaviors & outliers are anomalous - User acbons - Access to data - ApplicaBons open/run - File acbons

Reduce False Positives Increase risk of an entity (e.g. user) based on probability, severity, risk and recency of observed behavioral events (anomalies, violations, exfiltrations) Allows real-time aggregation or correlation of multiple event models Reduces false positives and noise John Sneakypants is accessing an unusual, important network share at a time of day he was almost never active at before and took from a source code project that has been inactive for months and just copied an unusual amount of sensitive files to a USB drive 25 46 80 96 30

Real World Example Analyzed a large semiconductor developer community (>20,000 developers) to look for behavioral indicators of risk Identified 2 known source code thieves and leavers Identified 11 previously unknown threats - 2 confirmed: terminated - 1 confirmed: is currently under investigation - 8 Chinese employees replicating 600,000 to nearly 15,000,000 files per day. Currently under investigation Visualization of Interset Cluster Leaver 1 Dots = source code projects Lines connecting dots = developers using those projects 31

Effective Behavioral Analytics Bad Rules- based alerts alone ClassificaBon systems alone Simple mean/standard deviabon based thresholds, generic anomaly detecbon Hard decision boundaries Good Probability- based anomaly + cost- based models Machine learning models Robust models (handle outliers, big data, responds to change) Numerical scores à Flood of alerts, hard to deploy, scale and maintain à Less noise, easier to deploy and scale, ability to focus on top n incidents, POI, etc. 32

Pulling it all together 2014 Interset, a FileTrek Company 33

Big Data Analytics in Security Adaptive Analysis Continual Analysis Optimization under Uncertainty Optimization Predictive Modeling Simulation Forecasting Alerts Query/Drill Down Ad hoc Reporting Standard Reporting Entity Resolution Relationship, Feature Extraction Annotation and Tokenization Responding to context Responding to local change/feedback Quantifying or mitigating risk Decision complexity, solution speed Causality, probabilistic, confidence levels High fidelity, games, data farming Larger data sets, nonlinear regression Rules/triggers, context sensitive, complex events In memory data, fuzzy search, geo spatial Query by example, user defined reports Real time, visualizations, user interaction People, roles, locations, things Rules, semantic inferencing, matching Automated, crowd sourced We are here. Source Competing on Analytics, Davenport and Harris, 2007 34

Future of Big Data Analytics in Security Intelligent Sensors and Ubiquitous Data Sources Desktops and Servers Mobile Cloud Social Networks Open Data, External Data, IOCs ReputaBon and Risk Services Enterprise to Global Systems Behavioral and Threat Analy8cs PlaSorm Forensic Analysis Risk Modeling Anomaly DetecBon EnBty ResoluBon Behavioral SimulaBon Behavioral PredicBon Threat Response OpBmizaBon Advanced Threat Detec8on and Response What happened? How many, how ocen? Where is the risk and threat? How can this threat be contained? How can we prevent this? What will happen next? What is the best possible response to this threat? 35

Thank You! Questions? Upload your logs, try out our math Cloud- hosted Threat Analysis