ENTRADA: Enabling DNS Big Data Applications



Similar documents
Malicious domains: Automatic Detection with DNS traffic analysis

TLD Data Analysis. ICANN Tech Day, Dublin. October 19th 2015 Maarten Wullink, SIDN. Klik om de s+jl te bewerken

Big data security on.nl: infrastructure and one application

DNS Big Data

.nl ENTRADA. CENTR-tech 33. November 2015 Marco Davids, SIDN Labs. Klik om de s+jl te bewerken

A Privacy framework for DNS big data. November 28, 2014 Jelte Jansen

Preetham Mohan Pawar ( )

Complexity and Scalability in Semantic Graph Analysis Semantic Days 2013

# Not a part of 1Z0-061 or 1Z0-144 Certification test, but very important technology in BIG DATA Analysis

Botnet Analysis Leveraging Domain Ratio Analysis Uncovering malicious activity through statistical analysis of web log traffic

DNS Footprint of Malware

SQLSaturday #399 Sacramento 25 July, Big Data Analytics with Excel

BIG DATA TECHNOLOGY. Hadoop Ecosystem

DNS Basics. DNS Basics

HADOOP SOLUTION USING EMC ISILON AND CLOUDERA ENTERPRISE Efficient, Flexible In-Place Hadoop Analytics

CS54100: Database Systems

Hadoop. MPDL-Frühstück 9. Dezember 2013 MPDL INTERN

Internet Monitoring via DNS Traffic Analysis. Wenke Lee Georgia Institute of Technology

Big Data Analytics. with EMC Greenplum and Hadoop. Big Data Analytics. Ofir Manor Pre Sales Technical Architect EMC Greenplum

WE KNOW IT BEFORE YOU DO: PREDICTING MALICIOUS DOMAINS Wei Xu, Kyle Sanders & Yanxin Zhang Palo Alto Networks, Inc., USA

DNSwitness: recent developments and the new passive monitor

BIG DATA CHALLENGES AND PERSPECTIVES

FAQ (Frequently Asked Questions)

Domain Name Abuse Detection. Liming Wang

web hosting and domain names

Application Development. A Paradigm Shift

IBM InfoSphere Guardium Data Activity Monitor for Hadoop-based systems

AGENDA. What is BIG DATA? What is Hadoop? Why Microsoft? The Microsoft BIG DATA story. Our BIG DATA Roadmap. Hadoop PDW

SIDekICk SuspIcious DomaIn Classification in the.nl Zone

Identifying Patterns in DNS Traffic

New DNS Traffic Analysis Techniques to Identify Global Internet Threats. Dhia Mahjoub and Thomas Mathew January 12 th, 2016

Know Your Foe. Threat Infrastructure Analysis Pitfalls

Overview An Evolution. Improving Trust, Confidence & Safety working together to fight the beast. Microsoft's online safety strategy

The big data revolution

Hadoop Ecosystem Overview. CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook

Computer Networks: Domain Name System

Workshop on Hadoop with Big Data

TUT NoSQL Seminar (Oracle) Big Data

We Know It Before You Do: Predicting Malicious Domains

Data. Data and database. Aniel Nieves-González. Fall 2015

Migrating to.bank A step-by-step roadmap for migrating to.bank

Big Data in Action: Behind the Scenes at Symantec with the World s Largest Threat Intelligence Data

Application and practice of parallel cloud computing in ISP. Guangzhou Institute of China Telecom Zhilan Huang

Networks and Security Lab. Network Forensics

DMARC and your.bank Domain. September 2015 v

Domain Name Cyberscape Report

How To Handle Big Data With A Data Scientist

Internet Standards. Sam Silberman, Constant Contact

Large-Scale TCP Packet Flow Analysis for Common Protocols Using Apache Hadoop

Large-scale DNS and DNSSEC data sets for network security research

Department of Computer Science University of Cyprus EPL646 Advanced Topics in Databases. Lecture 15

Associate Professor, Department of CSE, Shri Vishnu Engineering College for Women, Andhra Pradesh, India 2

MIS 510: Cyber Analytics Project

LARGE-SCALE INTERNET MEASUREMENTS FOR DIAGNOSTICS AND PUBLIC POLICY. Henning Schulzrinne (+ Walter Johnston & James Miller) FCC & Columbia University

A DNS Anomaly Detection and Analysis System

Retail POS Data Analytics Using MS Bi Tools. Business Intelligence White Paper

On a Hadoop-based Analytics Service System

Six Days in the Network Security Trenches at SC14. A Cray Graph Analytics Case Study

Domain Analytics. Jay Daley,.nz Registrar Conference, 2015

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

Agenda. Taxonomy of Botnet Threats. Background. Summary. Background. Taxonomy. Trend Micro Inc. Presented by Tushar Ranka

Scaling Big Data Mining Infrastructure: The Smart Protection Network Experience

CYBERSECURITY INESTIGATION AND ANALYSIS

A privacy framework for DNS big data applications *

Strengthening our Ecosystem through Stakeholder Collaboration. Jia-Rong Low, Sr Director, Asia 20 August 2015

Outline. What is Big data and where they come from? How we deal with Big data?

Introduction to Cloud Computing

Dell In-Memory Appliance for Cloudera Enterprise

The Future of Data Management

How To Stop A Malicious Dns Attack On A Domain Name Server (Dns) From Being Spoofed (Dnt) On A Network (Networking) On An Ip Address (Ip Address) On Your Ip Address On A Pc Or Ip Address

ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat

BIG DATA TRENDS AND TECHNOLOGIES

Solving Big Data Problem using Hadoop File System (HDFS)

L1: Introduction to Hadoop

An Industrial Perspective on the Hadoop Ecosystem. Eldar Khalilov Pavel Valov

DNS Noise: Measuring the Pervasiveness of Disposable Domains in Modern DNS Traffic

Introduction to Big Data Training

TAMING THE BIG CHALLENGE OF BIG DATA MICROSOFT HADOOP

Detection of NS Resource Record DNS Resolution Traffic, Host Search, and SSH Dictionary Attack Activities

Sender Authentication Technology Deployment and Authentication Identifiers

Introduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data

Hadoop Beyond Hype: Complex Adaptive Systems Conference Nov 16, Viswa Sharma Solutions Architect Tata Consultancy Services

Monitor Network Activity

Big Data Approaches. Making Sense of Big Data. Ian Crosland. Jan 2016

Products, Features & Services

Programming Hadoop 5-day, instructor-led BD-106. MapReduce Overview. Hadoop Overview

Data processing goes big

Top 10 Security Trends

Prognoz Payment System Data Analysis. Description of the solution

Glossary of Technical Terms Related to IPv6

Report. Takeover of Virut domains

A Tour of the Zoo the Hadoop Ecosystem Prafulla Wani

Big Data With Hadoop

Big Data and Industrial Internet

Transcription:

ENTRADA: Enabling DNS Big Data Applications Maarten Wullink - SIDN APWG ecrime 2016 June 2 nd 2016 - Toronto

What if You have many TB s of network data? And you want to: 1. Store it efficiently 2. Query it efficiently (SQL with interactive response times) 3. Quickly test a large number of hypotheses on your data 4. Continuously keep adding new data

You could 1. Convert pcap to text format like csv and use Linux utilities 2. Run Hadoop MapReduce jobs on csv/pcap 3. Store it in a RDBMS 4. With most options it will be hard to scale and deliver interactive response times

What to do? Build your own data stream warehouse (DSW) ENTRADA is our open source Hadoop-based DSW (entrada.sidnlabs.nl) Analyze 50TB of converted pcap data in under 3.5 minutes using a small cluster Our main use case: network (DNS, TCP/IP, ICMP) analytics

ENTRADA ENhanced Top-Level Domain Resilience through Advanced Data Analysis ENTRADA'Applica8ons'and'Services' Support'Libraries' IMPALA' HDFS' Parquet' Workflow'Manager' DNS'Library' PCAP'Converter' ENTRADA'Pla+orm' Generic'components' ENTRADAJspecific'components' Name'Servers'

ENTRADA@SIDN We are the TLD registry of the Netherlands (.nl) Use ENTRADA to further increase security and stability Operational for over 2 years Capturing data from.nl name servers 160 billion rows (DNS query+response tuple), 21 TB of data

More ENTRADA details For design choices and a performance evaluation, see our 2016 NOMS paper: ENTRADA: a High-Performance Network Traffic Data Streaming Warehouse, IEEE/ IFIP Network Operations and Management Symposium 2016 (NOMS 2016), Instanbul, Turkey See: https://www.sidnlabs.nl/publicaties

Example Use Cases Statistics (stats.sidnlabs.nl) Scientific research Insight for DNS operators Malicious domain detection Botnet client detection Measuring uptake of email security

Malicious Domain Detection (1/2) Observation: New phishing domains have distinct query patterns Random Sample Jan Mar, 2015 Phishing Daily Queries 3.5 4.5 5.5 Daily Queries 0 50 150 0 5 10 15 20 25 30 Days After Registration 0 5 10 15 20 25 30 Days After Registration G. Moura, M. Müller, M. Wullink, and C. Hesselman, ndews: a New Domains Early Warning System for TLDs, IEEE/IFIP International Workshop on Analytics for Network and Service Management (AnNet 2016), https:// www.sidnlabs.nl/publicaties

Malicious Domain Detection (2/2) Every day workflow Normal Newly Registered Domains Domain Characteristics Cluster Domains Suspicious Registry DB ENTRADA Σ PReq: popularity Σ PIPs: resolver diversity Σ PCC: country diversity Σ PASes: AS diversity Notify Registrar

Botnet Client Detection (1/2).nl Privacy Board Bot Shared (ISP) Resolver DetecBon Algoritm Privacy Policy ENTRADA Pla8orm Abusedesk Bot.nl auth Name server Bot

Botnet Client Detection (2/2)

Uptake of DKIM/DMARC (1/3) Email security standards DKIM (RFC 6376) and DMARC (RFC 7489) Approach: count standardized labels Where is DKIM/DMARC used most? select country,count(1) as total from dns.queries where qtype=16 and (qname like %_domainkey.% or qname like _dmarc.% ) and rcode=0 and ((year=2014 and month>6) or year=2015) group by country Use standard SQL for analysis

Uptake of DKIM/DMARC (2/3) Country # Queries Percentage US 208,533,790 42.60 IE 84,515,235 17.26 NL 79,052,717 16.15 BE 67,963,161 13.88 FI 9,112,053 1.86 RU 7,306,873 1.49 DE 7,119,556 1.45 GB 5,897,734 1.20 CN 5,446,895 1.11 DK 2,958,891 0.60 89.9% of queries originate from top 4 countries

Uptake of DKIM/DMARC (3/3) Provider ASN # Queries Percentage Google AS15169 302,465,578 61.79 Microsoft AS8075 51,556,416 10.53 Unknown UNKN 15,788,699 3.22 AOL AS1668 12,971,456 2.65 Yahoo AS36647 11,83,129 2.30 Yahoo AS26101 10,24,857 2.07 Yahoo AS36646 9,150,523 1.87 Yahoo AS34010 4,522,388 0.92 IDC China Tel AS23724 4,520,819 0.92 Mail.ru AS47764 3,659,097 0.75 82.13% of queries originate from 4 large e-mail providers

Summary We have shown ENTRADA, a DSW built using open-source big data tools It enables quick hypothesis testing and application development using SQL We have shown real world example use cases ENTRADA can be extended to other use cases Download and contribute!

Future Work More DNS research in collaboration with research partners Develop data-driven applications and services based on ENTRADA Facilitate ENTRADA user community

Questions? Maarten Wullink Sr. Research Engineer maarten.wullink@sidn.nl @wulliak www.sidnlabs.nl entrada.sidnlabs.nl