Domain Hygiene as a Predictor of Badness

Similar documents
Defending Networks with Incomplete Information: A Machine Learning Approach. Alexandre

Know Your Foe. Threat Infrastructure Analysis Pitfalls

When Reputation is Not Enough: Barracuda Spam Firewall Predictive Sender Profiling. White Paper

EVILSEED: A Guided Approach to Finding Malicious Web Pages

Whose IP Is It Anyways: Tales of IP Reputation Failures

Anti Spam Best Practices

Big Data in Action: Behind the Scenes at Symantec with the World s Largest Threat Intelligence Data

When Reputation is Not Enough: Barracuda Spam & Virus Firewall Predictive Sender Profiling

Threat Intelligence is Dead. Long Live Threat Intelligence!

Security Intelligence Blacklisting

Zero day attacks anatomy & countermeasures. By Cade Zvavanjanja Cybersecurity Strategist

Reputation based Security. Vijay Seshadri Zulfikar Ramzan Carey Nachenberg

SPAM, VIRUSES AND PHISHING, OH MY! Michael Starks, CISSP, CISA ISSA Fellow 10/08/2015

How To Filter From A Spam Filter

What Spammers Don t Want You To Know About Permanently Blocking Their Vicious s

Fighting Advanced Threats

Indicator Expansion Techniques Tracking Cyber Threats via DNS and Netflow Analysis

Commtouch RPD Technology. Network Based Protection Against -Borne Threats

They Did What?!? How Your End Users Are Putting You At Risk

Removing Web Spam Links from Search Engine Results

VIRUS TRACKER CHALLENGES OF RUNNING A LARGE SCALE SINKHOLE OPERATION

CYBERSECURITY INESTIGATION AND ANALYSIS

Overview An Evolution. Improving Trust, Confidence & Safety working together to fight the beast. Microsoft's online safety strategy

SIMPLE STEPS TO AVOID SPAM FILTERS DELIVERABILITY SUCCESS GUIDE

More Details About Your Spam Digest & Dashboard

ESG Brief. Overview by The Enterprise Strategy Group, Inc. All Rights Reserved.

WHITE PAPER: THREAT INTELLIGENCE RANKING

Context Adaptive Scanning Engine: Protecting Against the Broadest Range of Blended Threats

Cymon.io. Open Threat Intelligence. 29 October 2015 Copyright 2015 esentire, Inc. 1

WEBSENSE SECURITY SOLUTIONS OVERVIEW

Abused Internet Domain Registration Analysis for Calculating Risk and Mitigating Malicious Activity

1. Introduction Deliverability-Benchmarks Working with Your Service Provider sent delivered...

PineApp Anti IP Blacklisting

Social Media Mining. Data Mining Essentials

Domain Name Abuse Detection. Liming Wang

WE KNOW IT BEFORE YOU DO: PREDICTING MALICIOUS DOMAINS Wei Xu, Kyle Sanders & Yanxin Zhang Palo Alto Networks, Inc., USA

eprism Security Appliance 6.0 Intercept Anti-Spam Quick Start Guide

User Documentation Web Traffic Security. University of Stavanger

Recurrent Patterns Detection Technology. White Paper

Integrating MSS, SEP and NGFW to catch targeted APTs

SECURITY ANALYTICS MOVES TO REAL-TIME PROTECTION

FireEye Threat Prevention Cloud Evaluation

Protect Your Brand Investment with. Brand Monitoring. from DomainTools DOMAINTOOLS SOLUTION BRIEF

Purchase College Barracuda Anti-Spam Firewall User s Guide

Protect Yourself. Who is asking? What information are they asking for? Why do they need it?

Applying Machine Learning to Network Security Monitoring. Alex Pinto Chief Data Scien2st

Cybersecurity: An Innovative Approach to Advanced Persistent Threats

Bridging the gap between COTS tool alerting and raw data analysis

Threat Intelligence: What is it, and How Can it Protect You from Today s Advanced Cyber-Attacks A Webroot publication featuring analyst research

Machine Learning Final Project Spam Filtering

Safer Internet Day Quiz

Enhanced Spam Defence

Reduce Your Network's Attack Surface

Reputation Metrics Troubleshooter. Share it!

Next Generation IPS and Reputation Services

Evaluating DMARC Effectiveness for the Financial Services Industry

IT Sicherheit im Web 2.0 Zeitalter

EECS 588: Computer and Network Security. Introduction January 14, 2014

When Reputation is Not Enough. Barracuda Security Gateway s Predictive Sender Profiling. White Paper

LASTLINE WHITEPAPER. Using Passive DNS Analysis to Automatically Detect Malicious Domains

New DNS Traffic Analysis Techniques to Identify Global Internet Threats. Dhia Mahjoub and Thomas Mathew January 12 th, 2016

F-Secure Internet Security 2014 Data Transfer Declaration

Access Control Rules: URL Filtering

ClearSkies SIEM Security-as-a-Service (SecaaS) Infocom Security Athens April 2014

Zscaler Internet Security Frequently Asked Questions

FILTERING FAQ

DST . Product FAQs. Thank you for using our products. DST UK

Secure Because Math: Understanding ML- based Security Products (#SecureBecauseMath)

A White Paper. VerticalResponse, Delivery and You A Handy Guide. VerticalResponse,Inc nd Street, Suite 700 San Francisco, CA 94107

Symantec Cyber Threat Analysis Program Program Overview. Symantec Cyber Threat Analysis Program Team

The What, Why, and How of Authentication

Into the cybersecurity breach

Correlation and Phishing

SHARE THIS WHITEPAPER. On-Premise, Cloud or Hybrid? Approaches to Mitigate DDoS Attacks Whitepaper

Quarantined Messages 5 What are quarantined messages? 5 What username and password do I use to access my quarantined messages? 5

HPL Barracuda Spam/Virus Firewall

Don t Click That Link and other security tips. Laura Perry Jennifer Speegle Mike Trice

We Know It Before You Do: Predicting Malicious Domains

Cisco Security Intelligence Operations

Collateral Damage. Consequences of Spam and Virus Filtering for the System. Peter Eisentraut 22C3. credativ GmbH.

Eight Essential Elements for Effective Threat Intelligence Management May 2015

The Latest Internet Threats to Affect Your Organisation. Tom Gillis SVP Worldwide Marketing IronPort Systems, Inc.

How To Ensure Your Is Delivered

How to minimize SPAM in your CBPref.com Inbox

INinbox Start-up Pack

GFI Product Comparison. GFI MailEssentials vs Barracuda Spam Firewall

eprism Security Suite

Transcription:

Domain Hygiene as a Predictor of Badness Tim Helming Director, Product Management DomainTools Your Presenter Director of Product Management (aka the roadmap guy ) Over 13 years in cybersecurity Passionate about fighting the good fight! 2

Contents Meet your Presenter The Hypothesis Examples of good and bad hygiene Algorithms for predictive reputation scoring Test methodology and findings Future directions Q&A 3 Your Presenter s Employer Started by domainers as a(nother) Whois lookup site Added a few unique twists Maintained historical DB of information Went beyond canonical Whois data Built research tools atop all of this Today: it s not just for domainers any more. Lots of cybersleuths use it. 4

The Hypothesis Malicious domains are not set up the same way as legitimate ones much like physical dens of crime are not (usually) very similar to legit businesses Many of the characteristics that distinguish the legit from the illegit are visible in the public record These characteristics can be used to characterize domains and registrants This can help predict badness of domains before they strike or even before they re registered 5 How Might We Use Hygiene Information? Add a dimension to existing reputation scoring Assign a higher risk profile to unhygienic domains Help prioritize attribution/investigation targets when potential targets abound 6

Bolstering Reputation Feeds Traditional Reputation Feeds Tend to be malware focused (not always) Often require the domain to show badness, i.e. hurt victims, before listing them Ignore registrant as a vector of risk, and registration details as a marker of risk 7 Bolstering Reputation Feeds A Hygiene-Aware Reputation Feed Could raise risk profile of domains that have not otherwise implicated themselves Could predictively raise risk profile of domains as they come online (conceptually before they come online) via registrant reputation 8

Does Your Domain Have Good Hygiene? What does the public record (OSINT) tell us? Markers of Legitimate Commercial Domains Linguistic Coherence MX Record (most businesses use email ) Valid Physical Address Valid Phone Number Age (yes, we re practicing ageism but unlike the other markers, this one is self-healing!) 9 Does That Domain Have Bad Hygiene? What does the public record (OSINT) tell us? Markers of Illegitimate Domains Linguistic incoherence of domain name, registrant name/email Lack of mail servers Lack of web presence Irrational/invalid Physical Address Irrational/invalid Phone Numbers 10

And now To the Science! 11 Methodology 1.0 We started with the easy stuff Built three corpuses of domains: Legitimate domains (a random selection of businesses and nonprofits, and Alexa top-ranked sites) Unknown (a truly random set with no qualifications) Bad (a random selection from reputable malware/spam/phishing classification providers) And we compared their hygiene characteristics 12

Scoring 1.0 Criterion (range) Low Risk Medium Risk High Risk Linguistic Coherence (0-2) 0=domain name makes sense linguistically (including acronyms/abbreviations) 1=Questionable. High entropy but can pass the "squint test." 2=Incoherent Age (0-2) 0=>45 days 1=7-45 days 2=<7 days MX Record (0-1) 0=has MX record 1=no MX record Web Server (0-1) 0=has Web server 1=no server Physical Address Coherence (0-2) Phone # Coherence (0-2) 0=valid address 1=indeterminate or partial 2=invalid address 0=valid phone number 1=indeterminate or partial 2=not a valid phone number Scores were composites of these 13 Results: Test Corpuses 1.0 Alexa Top Domains Unknown Malware Spam Phishing 0.047 0.1585 0.3615 0.573 0.6905 Why is malware in the middle? Likely reflects the combination of pwned and evil sites (pwned ones would in many cases have good hygiene, evil ones, not so much) 14

Methodology 2.0 Okay, that was the easy stuff Now for more rigor. 15 Methodology 2.0 Using AI to assess entropy in Whois record fields All random sampling rather than hand-picked corpuses Spot checks comparing high score (high badness) vs low score domains Do new bad domains show up on malware lists? 16

Methodology 2.0 Using AI to classify threats based on Whois Step 1: Measurements on each record Linguistic analysis of domain names, and other text based Whois data fields Frequency of linguistically rational bigrams Letter, number, symbol, vowel ratios 17 Methodology 2.0 More AI to classify threats based on Whois Step 1 (cont d): Measurements on each record Analysis of contacts information Impossible names (entropy) Improbable names ( Donald Duck ) Impossible phone numbers Improbable region words (e.g. Alabama in France) Bad postal codes Throwaway email domains in contact emails Generic email domains in contact emails 18

Methodology 2.0 More AI to classify threats based on Whois Step 1 (cont d): Measurements on each record Domain names matching hash patterns (md5/ sha) Privacy protection (and similar) Registration duration and age Indicators of data completeness 19 Methodology 2.0 Various domain / Whois hygiene indicators into AI Unsupervised Classification (e.g. k-means) Find ~15 classes of domains with similar scores in the hygiene space Look at counts of threats (spam, botnet, etc) in each class Supervised Classification (e.g. random forests) Fit a model to a blacklist and predict blacklists of future Calibration Step: Exploratory Stats Summarize domain hygiene data to better design classifiers 20

Results 2.0: Ranking of Type by Score Some Results from AI Scoring Ranking by Type, based on length and entropy 1. Botnet (longest, highest entropy) 2. Malware 3. Phishing 4. Spam 5. Not a Known Threat (shortest, lowest entropy) Each parameter (length, entropy) gave the same ranking 21 Close-Up: Relative Entropy by Type How do different types of domains compare by domain name entropy? 22

Close-Up: Relative Length by Type How do different types of domains compare by domain name length? 23 Zooming Back Out Some Practical Considerations 24

A word about false positives Reputation scoring is not blacklisting More like actuarial risk scoring Hygiene is correlated with, but not deterministically attached to, legitimacy One (or two) higher-risk attributes do not sink a domain However. 25 A word about false positives Hygiene-based false positives behave differently They are low-risk FPs Some such domains will be assigned high risk scores when they aren t hosting malware, but Does anyone really need to visit 829fh92-s8s.com IOW, they are very unlikely to be legit Ergo, FPs based on hygiene scoring aren t so likely to be an IT headache 26

Using Hygiene Analysis Today You can use this approach in investigations today (modulo scale) As a filtering connection method: signal: given given a pool a pool of of suspicious domains which domains, may or may examine not be Whois linked, records: do they have do they common pass the bogon sniff test? registrant (Parsed info? records Or, does make a given this easier) bogon registrant Focus on hold the bogon more than ones the first one domain you started looking at? 27 Using Hygiene Analysis Today and, you can use it as a defense mechanism, albeit manually Identify registrants of evil domains (it doesn t matter if the info is bogus, as long as you ve got the same registrant) Do reverse lookups of these registrants to see if they own other domains you ve not seen Add the other domains to blacklists, sinkholes, or other configuration inputs 28

Future Directions Iterate, refine, improve Watch for registration trends that could affect the AI algs or suggest new ones Integrate with our other branch of reputation tech ( proximity to badness ) Keep exploring 29 Your Turn Q & A 30

DomainTools says Thank You! 31