EVILSEED: A Guided Approach to Finding Malicious Web Pages

Similar documents
EVILSEED: A Guided Approach to Finding Malicious Web Pages

EvilSeed: A Guided Approach to Finding Malicious Web Pages

CS 558 Internet Systems and Technologies

Threat Spotlight: Angler Lurking in the Domain Shadows

LASTLINE WHITEPAPER. Large-Scale Detection of Malicious Web Pages

Deciphering and Mitigating Blackhole Spam from -borne Threats

HackAlert Malware Monitoring

WHY DOES MY SPEED MONITORING GRAPH SHOW -1 IN THE TOOLTIP? 2 HOW CAN I CHANGE MY PREFERENCES FOR UPTIME AND SPEED MONITORING 2

Introduction: 1. Daily 360 Website Scanning for Malware

THREAT VISIBILITY & VULNERABILITY ASSESSMENT

ReadySpace Limited Unit J, 16/F Reason Group Tower, Castle PeakRoad, Kwai Chung, N.T.

LASTLINE WHITEPAPER. Using Passive DNS Analysis to Automatically Detect Malicious Domains

How To Create A Spam Detector On A Web Browser

Web Vulnerability Scanner by Using HTTP Method

Basheer Al-Duwairi Jordan University of Science & Technology

Next Generation IPS and Reputation Services

Poisoned search results: How hackers have automated search engine poisoning attacks to distribute malware.

Symantec Cyber Threat Analysis Program Program Overview. Symantec Cyber Threat Analysis Program Team

Search engine optimization: Black hat Cloaking Detection technique

Domain Name Abuse Detection. Liming Wang

NTT R&D s anti-malware technologies

User Documentation Web Traffic Security. University of Stavanger

GETTING REAL ABOUT SECURITY MANAGEMENT AND "BIG DATA"

Shady Paths: Leveraging Surfing Crowds to Detect Malicious Web Pages

Don DeBolt and Kiran Bandla 29 September 2010

FIRST WORKING DRAFT FOR PUBLIC COMMENT. StopBadware s Best Practices for Web Hosting Providers: Responding to Malware Reports.

When Reputation is Not Enough: Barracuda Spam Firewall Predictive Sender Profiling. White Paper

WE KNOW IT BEFORE YOU DO: PREDICTING MALICIOUS DOMAINS Wei Xu, Kyle Sanders & Yanxin Zhang Palo Alto Networks, Inc., USA

Agenda. Taxonomy of Botnet Threats. Background. Summary. Background. Taxonomy. Trend Micro Inc. Presented by Tushar Ranka

THE OPEN UNIVERSITY OF TANZANIA

How Attackers are Targeting Your Mobile Devices. Wade Williamson

We Know It Before You Do: Predicting Malicious Domains

Shellshock. Oz Elisyan & Maxim Zavodchik

TECHNICAL REPORT. An Analysis of Domain Silver, Inc..pl Domains

SPAM, VIRUSES AND PHISHING, OH MY! Michael Starks, CISSP, CISA ISSA Fellow 10/08/2015

The Advanced Attack Challenge. Creating a Government Private Threat Intelligence Cloud

APPLICATION PROGRAMMING INTERFACE

An analysis of exploitation behaviors on the web and the role of web hosting providers in detecting them

Whose IP Is It Anyways: Tales of IP Reputation Failures

Hope for the best, prepare for the worst:

Comprehensive Malware Detection with SecurityCenter Continuous View and Nessus. February 3, 2015 (Revision 4)

Radware s Behavioral Server Cracking Protection

SOLUTIONS FOR TOMORROW

Network Monitoring using MMT:

10 Things Every Web Application Firewall Should Provide Share this ebook

Introduction The Case Study Technical Background The Underground Economy The Economic Model Discussion

CSE 3482 Introduction to Computer Security. Denial of Service (DoS) Attacks

When Reputation is Not Enough: Barracuda Spam & Virus Firewall Predictive Sender Profiling

Endpoint Threat Detection without the Pain

Rational AppScan & Ounce Products

Redhawk Network Security, LLC Layton Ave., Suite One, Bend, OR

Using big data analytics to identify malicious content: a case study on spam s

So today we shall continue our discussion on the search engines and web crawlers. (Refer Slide Time: 01:02)

Securing Your Business with DNS Servers That Protect Themselves

Enterprise-Grade Security from the Cloud

The Dark Side of Trusting Web Searches From Blackhat SEO to System Infection

Juice: A Longitudinal Study of an SEO Campaign. David Y. Wang, Stefan Savage, and Geoffrey M. Voelker University of California, San Diego

Actionable information for security incident response

Cyber Security in Taiwan's Government Institutions: From APT To. Investigation Policies

OPTIMIZATION SEARCH ENGINE WHAT IS SEO? (855)

Protection, Usability and Improvements in Reflected XSS Filters

Indicator Expansion Techniques Tracking Cyber Threats via DNS and Netflow Analysis

Technology Blueprint. Protect Your Servers. Guard the data and availability that enable business-critical communications

Securing Your Business with DNS Servers That Protect Themselves

JUNIPER NETWORKS SPOTLIGHT SECURE THREAT INTELLIGENCE PLATFORM

Don t scan, just ask A new approach of identifying vulnerable web applications. 28th Chaos Communication Congress, 12/28/11 - Berlin

Hunting the Red Fox Online: Understanding and Detection of Mass Redirect-Script Injections

Web Client Attacks. Scribed by Gelareh Taban. April 21, Web Server Attacks continued

QUARTERLY REPORT 2015 INFOBLOX DNS THREAT INDEX POWERED BY

Domain Footprinting for Web Applications and Web Services

McAfee. Firewall Enterprise. Application Note TrustedSource in McAfee. Firewall Enterprise. version and earlier

WHITE PAPER. Using DNS RPZ to Protect Against Web Threats SPON. Published June 2015 SPONSORED BY. An Osterman Research White Paper.

Log Analysis: Overall Issues p. 1 Introduction p. 2 IT Budgets and Results: Leveraging OSS Solutions at Little Cost p. 2 Reporting Security

DDoS Attacks - Peeling the Onion on One of the Most Sophisticated Ever Seen. Eldad Chai, VP Product

Wie Cyber-Kriminelle IT-Security Systeme umgehen. Andreas Maar Senior Security Engineer

Adaptive Intelligent Firewall - der nächste Entwicklungssprung der NGFW. Jürgen Seitz Systems Engineering Manager

Securing Your Business with DNS Servers That Protect Themselves

Putting Web Threat Protection and Content Filtering in the Cloud

Content Security: Protect Your Network with Five Must-Haves

[state of the internet] / SEO Attacks. Threat Advisory: Continuous Uptick in SEO Attacks

Cisco Protects Internal Infrastructure from Web-Based Threats

RSA Enterprise Compromise Assessment Tool (ECAT) Date: January 2014 Authors: Jon Oltsik, Senior Principal Analyst and Tony Palmer, Senior Lab Analyst

Scanless Vulnerability Assessment. A Next-Generation Approach to Vulnerability Management

SECURITY TERMS: Advisory Backdoor - Blended Threat Blind Worm Bootstrapped Worm Bot Coordinated Scanning

SiteLock SECURE Partner Program FAQ

Fighting Advanced Threats

Bad Ads Spotlight: Ad Cloaking Abuses. May TrustInAds.org. Keeping people safe from bad online ads

Application Security Backgrounder

Best Practices Top 10: Keep your e-marketing safe from threats

Concierge SIEM Reporting Overview

Protecting Your Organisation from Targeted Cyber Intrusion

Zscaler Internet Security Frequently Asked Questions

SHARE THIS WHITEPAPER. On-Premise, Cloud or Hybrid? Approaches to Mitigate DDoS Attacks Whitepaper

Advanced Persistent Threats

Doyourwebsitebot defensesaddressthe changingthreat landscape?

Anatomy of Comment Spam

The purpose of this report is to educate our prospective clients about capabilities of Hackers Locked.

Be Prepared for Java Zero-day Attacks

Secure Web Gateways Buyer s Guide >

Transcription:

+ EVILSEED: A Guided Approach to Finding Malicious Web Pages Presented by: Alaa Hassan Supervised by: Dr. Tom Chothia

+ Outline Introduction Introducing EVILSEED. EVILSEED Architecture. Effectiveness of EVILSEED. Discussion and Limitations. Conclusion.

+ Searching the Web How would you identify a page to be malicious? Are the current techniques for identifying malicious pages effective in your opinion?

+ Identifying Malicious Web Pages is A Challenging Task The web is a very large place. Everyday new pages whether legitimate and malicious are added to the web in a daunting pace. Attackers regularly perform scans for vulnerable hosts in which they can exploit to store malicious pages. Infected hosts are organized in complex malicious meshes to increase the chances of users landing on them.

+ Searching the Web A Three Step Process Using crawlers, URLs are collected in mass amounts. Fast prefiltering to quickly discard pages that are very likely to be legitimate. Oracles: Slowly and carefully analyze the remaining pages and detect malicious content using special tools, such as Honeyclients. Effective approach but not efficient: Resource consuming. Time consuming. Costly.

+ A Much More Efficient Approach EVILSEED is a guided approach to finding malicious web pages, in a much more efficient way: Improves the efficiency of web crawling phase. Starts from a set of known malicious pages. o Legit compromised web pages. o Pages set up by cybercriminals. Generate search engine queries to find pages that share certain similarities with the known malicious pages, Guided Search rather than random search. Allows gathering URLs with high toxicity.

+ Advantages of EVILSEED URLs found are much more likely to be malicious than a web page found by randomly crawling. Fixed amount of resources. Much faster. Could be beneficial to search engines.

+ Why EVILSEED Works? Malicious pages usually share similarities o Attackers usually search the web for patterns associated with vulnerable web applications that can be exploited by injecting malicious code into their pages. o Attackers use exploit toolkits to create their attack pages. o Many compromised pages are often linked to the same malicious page. Made use of available up to date tools and datasets in the guided search process o Passive DNS feeds. o Google & Bing crawler infrastructure. (indexed a large portion of the web, always up to date).

+ EVILSEED Components Seed: The (evil) seed is a set of pages that have been previously found to be malicious. Gadgets: The core of EVILSEED, they o extract info from the seed pages, o build the search engine queries based on that info, Expansion. o Gather back the URLs caught in the guided search process and pass them to the oracle for further analysis. Oracle: Further analysis is done. o Google s safe Browsing Blacklist. o Wepawet: service for detecting and analyzing web-based threats. o Custom built tool to detect fake AV sites.

+ EVILSEED Architecture

+ Gadgets EVILSEED implements five gadgets: Links Gadget: uses the web topology (web graph) to find pages that link to malicious resources. Content Dorks Gadget: identifying vulnerable and exploited web applications. Search Engine Optimization (SEO) Gadget: analyzes seed pages that belong to blackhat Search Engine Optimization campaigns. Domain Registrations Gadget: identifies suspicious sequences of domain registrations. DNS Queries Gadget: analyzes traces of DNS requests to locate pages that lead to a malicious domain.

+ Link Gadget Locates Malware Hubs (pages that contain links to several malicious URLs.). Seed: All URLs known to be malicious. Expansion: o Searches for malware hubs that link to the seed pages. o Forms search queries that are sent to Google, Bing and Yacy to distribute the load. o Retrieves the URLs and extracts all outgoing links from each URL.

+ Content Dorks Gadget Automates the generation of relevant Google Dorks Can automatically identify suitable dorks. Google dorks are the center of the Google Hacking database. Many hackers use google to find vulnerable webpages and later use these vulnerabilities for hacking.

+ Content Dorks Gadget Seed: Legitimate webpages that are compromised by attackers. (landing pages) o Contain indexable content o Remain online longer o Such sites share characteristics that can be identified. Expansion: queries are based on n-grams of words extracted from indexable content. n grams :type of probabilistic language model for predicting the next item in a sequence in an order (n-1). o Term extraction (extracts terms that best summarize the content of the page). o n-gram selection (extracts all sequences (of length n) of words from a landing page, ranks them according to their likelihood of occurring in a malicious page vs. benign page.

+ Search Engine Optimization Gadget cybercriminals use a variety of techniques to drive traffic to the malicious pages under their control. blackhat Search Engine Optimization (SEO) techniques o Attackers host many different web pages, optimized for different search terms, on each web site in a campaign. o Attackers host pages optimized for the same search terms on different web sites in a campaign. o Pages in a campaign often link to each other. SEO kits use semantic cloaking o Exploited web sites respond with completely different content depending on the source of a request.

+ Search Engine Optimization Gadget Seed: at least one malicious URL that is part of a live SEO campaign. Redirection based cloaking which is mostly used in blackhat SEO campaigns. o Visit the URL three times, with different value. If two or more different landing pages appear, cloaking is detected. Expansion: One cloaked URL will lead to other malicious page from the same campaign.

+ Domain Registrations Gadget Blacklists are one the most well known techniques to protect against web malware. Domain based blacklists contain domains that are discovered to host malicious content. Seed: all the domains that are known to host malicious pages, and domain registration records which are freely available online. Expansion: extracting and flagging domains of malicious URLs, then creating URLs by taking the closest malicious registered URL and replacing its domain with the one flagged. This gadget does not use the search engines but uses the guided search process when creating the URLs.

+ DNS Queries Gadget Analyzes recursive DNS traces to identify the domain names of compromised landing pages that are likely to lead to malicious pages. Seed: all domains known to host malicious pages. Expansion: large number of infected pages contain links to a single, malicious page, and that DNS traces (partially) expose these connections.

+ Effectiveness of EVILSEED There are two key components that measure effectiveness of EVILSEED: Toxicity: fraction of the URLs submitted to the oracles that are malicious. Higher values of toxicity imply that the resources needed to analyze a page are used more efficiently. Expansion: average number of new malicious URLs that EVILSEED finds for each seed. A higher seed expansion indicates that for each malicious seed URL a larger number of malicious URLs are found. There is a trade-off between toxicity and seed expansion.

+ A Test Run.. EVILSEED ran in parallel with a traditional crawler for 25 days. Malicious URLs found by the crawler, were added to EVILSEED seeds. Oracle used: Wepawet, Google Safe Browsing, Custom fake AV detector. All gadgets were used, except DNS queries gadget ( no access to DNS trace datasets) and domain registrations gadget (not fully developed)

+ A Test Run.. Assessed against two approaches of finding malicious webpages: o Random Search (Sending queries to search engines). o Traditional crawler with fast prefilter. To generate web queries: o Random alphabetic phrases, composed of 1 to 5 words, of length from 3 to 10 characters (e.g., asdf qwerou ); o Random phrases with words taken from the English dictionary, from 1 to 5 words (e.g., happy cat ); o Trending topics taken from Twitter and Google Hot Trends (e.g., black friday 2011 ); o Manually-generated Google dorks, taken from an online repository (e.g., allinurl:forcedownload.php?file=, which locates vulnerable WordPress sites)

+ Results EVILSEED: o submitted 226,140 URLs to the oracles,. o 3,036 URLs were found malicious. o toxicity of 1.34%. The Crawler & prefilter: o submitted 437,251 URLs to the oracles,. o 604 URLs were found malicious (these are the URLs we use as seeds for EVILSEED). o toxicity of 0,14%, which is an order of magnitude less than EVILSEED. The web search: o submitted 63,936 URLs to the oracles,. o 219 URLs were found malicious. o toxicity of 0.34%

+ Results EVILSEED clearly outperforms in toxicity both crawling (1.34% vs. 0.14%) and web searching (1.34% vs. 0.34%). Adding even relatively few new pages to the set of evil seeds enabled EVIL SEED to locate significant numbers of additional malicious pages.

+ Does EVILSEED find malicious URLs on different domains? EVILSEED: 6.14 malicious pages per domain. Crawler & fast prefilter: 6.16 malicious pages per domain. results show that EVILSEED maintains the same domain coverage as the crawler.

+ Links Gadget evaluation Three main categories have been used by the link gadgets to locate malicious content: o Unmaintained websites: The gadget found malicious content of such website. o Domains that publish blacklists of malicious domains: the gadget was able to automatically discover and parse these sources. o Domains that list additional info about a domain: for a given domain, it locates: All domain on the same IP. Domain hosted in the same subnet. Domains with similar spelling.

+ Content Dorks Gadget evaluation The most important factor in the success of this gadget was found to be n, the length of n-grams. Smaller n-grams are usually found in more pages. Toxicity for the results of queries ranged from 1,21% for 2- grams to 5,83% for 5-grams. Shorter n-grams means that more pages will compete for the top spots in the search engine rankings. The first ten most-successful dorks in term of toxicity were five 2-grams and five 3-grams.

+ SEO Gadget evaluation During the Test run, this gadget performed poorly as its seed at the time it was found, did not belong to a live SEO campaign. Fetched hourly the top trends for Twitter and Google Hot Trends, searched for them on Google, and analyzed the results with our cloaking detection heuristic. Then fed the URLs as seeds to SEO Gadget. The ratio of the malicious pages found over the visited pages is 0,93%, which is two orders of magnitude higher than the crawler (0,019% ).

+ Domain Registrations Gadget evaluation Domain registrations for the top-level domains.com.,.net.,.org.,.info. and.us were collected over a year s time. Gadget identified malicious URLs on 10, 435 domains using 1, 002 domains as seeds. Hypothesis: Malicious domains are registered close in time to each other. o o Given 1 malicious domain, at least one of the registrations that come immediately before or after it is also malicious. Data collected over the year, showed that these two events are correlated. Which concludes: domains that have been registered immediately before and after a known malicious domain are much more than 35 times likely to also serve malicious content.

+ DNS Queries Gadget evaluation Testing: Internet Service Provider (ISP) provided access to a DNS trace collected from its network during 43 days in February and March 2011. 377,472,280 queries sent by 30,000 clients. Trace was made available towards the end of the collection period, which caused a delay between the collection of data and the time when the gadget was run. Seed: 115 known malicious domains from the trace. Expansion: gadget generated 4,820 URLs on 2,473 domains. Result: o o o o o 171 URLs on 62 domains were identified malicious. Only 25 out of the 115 led to finding malicious URLs. The most effective domain guided the gadget to locate 46 malicious URLs on 16 different servers. 21 domains led to multiple malicious URLs The delay explains why no malicious URLs were found for the remaining 90 URLs.

+ Discussion and Limitations Security analysis: EVILSEED works by searching and finding malicious URLs. o Attacker with full control of an exploited website can hide the pages in which they won t be indexed by search engines. o Attackers could also try to perform evasion attacks against the detection techniques employed by our oracle (Wepawet, our custom fake AV page detector, and the Safe Browsing system). Would attackers go for hiding their pages from search engines? What if we connect EVILSEED to another oracle?

+ Discussion and Limitations Seed quality: The effectiveness of our gadgets is dependent on the quality and diversity of the malicious seed that they use as input. Results over time: For EVILSEED to be useful, it need a constant stream of high quality URLs rather than exhausting its effect after one or few runs.

+ Discussion and Limitations Performance and Scalability: The bottleneck of EVILSEED is the cost of performing in depth analysis with an oracle. EVILSEED runs on two servers: o Crawler: gathers millions of URLs. o Gadget: 100k URLs per search engine. Deployment: Search Engines could deploy EVILSEED. This might diminish its effectiveness but it also means that the vectors EVILSEED targets were mitigated.

+ Conclusion An important component of defense is the ability to identify as many malicious web pages on the Internet as possible in an efficient manner. The goal of EVILSEED was to improve the effectiveness of the search process for malicious web pages by leveraging a seed of known, malicious web pages and extracting characterizing similarities that these pages share.

+ Thank you..