Homeland Security Advanced Research Projects Agency DHS S&T Cyber Security Division (CSD) PREDICT Overview Douglas Maughan Division Director November 2, 2015 http://www.dhs.gov/cyber-research
DHS S&T Research Infrastructure Lowering the bar to meaningful cyber security R&D Research Data Repository (PREDICT) Repository of network data for use by the U.S.- based cyber security research community https://www.predict.org Experimental Research Testbed (DETER) Researcher and vendor-neutral experimental infrastructure Used by over 200 organizations from more than 20 states and 17 countries Used by over 40 classes, from 30 institutions involving 2,000+ students http://www.deter-project.org Software Assurance Market Place (SWAMP) A software assurance testing and evaluation facility and the associated research infrastructure services http://www.continuousassurance.org Launched February 3, 2014 2
PREDICT Background Rationale / Background / Historical: Researchers with insufficient access to data unable to adequately test their research prototypes Government technology decision-makers and researchers need data to evaluate competing products Supports scientific method via repeatability of tests and evaluations Unclear legal and ethical policies for Internet research Project Impetus: National Strategy to Secure Cyberspace (February 2003) 2009 Cyberspace Policy Review Supports Expanding Public Access to the Results of Federally Funded Research see http://www.whitehouse.gov/sites/default/files/microsit es/ostp/ostp_public_access_memo_2013.pdf The Research Data Repository (a.k.a. PREDICT) project is the only freely-available legally collected repository of large-scale datasets containing real network traffic and system logs in the U.S. Ethics & Disclosure Control Dataset Sharing PREDICT Cyber Security Datasets Legal Framework 3
PREDICT Project Architecture 4
PREDICT Project Legal Framework Application Review Board Researchers Account Request MOAs PREDICT Coordinating Center (PCC) MOAs Legal Analysis Data Providers Data Access MOAs Data Hosts MOAs Data Hosting 5
PREDICT Dataset Focus Large scale internet datasets DITL (Day in the Life of the Internet) Collection mechanisms include: The Los Nettos network 10 Gb backbone comprised of dark fiber and leased gigabit circuits Archipelago (Ark) - UCSD/CAIDA's world-wide active measurement infrastructure with 71 Ark monitors located in 35 countries Merit Network A regional ISP in Michigan with a 10 Gbps facilitiesbased core Packet Clearing House Numerous collection locations associated with global IXP activities Others Collegiate competitions entire CTF capture, including red team Synthetically generated datasets from other USG, e.g. DARPA Malware Command and Control Future Insider Threat, Mobile, CPSSEC, Others? 6
Dataset Categories Address Space Allocation Data Border Gateway Protocol (BGP) Routing Data Blackhole Address Space Data Domain Name System (DNS) Data Internet Topology Data Intrusion Detection System (IDS) and Firewall Data Infrastructure Data Internet Protocol (IP) Packet Headers Synthetically Generated Datasets Traffic Flow Data Application Layer Security Data Unsolicited Bulk Email Data Botnet Sinkhole Data National Collegiate Cyber Defense Competition Netalyzr Performance Data 7
Data Host/Providers UCSD/CAIDA Topology Measurements, Network Telescope USC ISI / Colorado State Universtiy NetFlow, Internet Topology Data, Address Allocation, Spam logs, IP reputation lists University of Michigan/Merit Networks Netflow, BGP Routing, Dark Address Space Monitoring, BGP Beacon Routing Georgia Tech Botnet Sinkhole Connection University of Wisconsin Global Intrusion Detection Database, Physical Infrastructure dataset Packet Clearing House BGP Routing, VoIP Measurement, Synthetic Attack Data 974 TB 113 TB 364 TB 0.1 TB 3.7 TB 10.0 TB TOTAL = 1500+ TB 8
Research Impact Over 300 research papers/journals/technical reports within the last 3 years using PREDICT datasets Research groups that have used PREDICT include: 117 academic institutions 88 commercial entities 37 Government organizations 8 Foreign 11 non-profit organizations Menlo Report Highly visible in the cybersecurity research community. Raising awareness of the importance of the issues associated with ethical and legal cybersecurity R&D 9
(Normative) Computer Ethics A typical problem in computer ethics arises because there is a policy vacuum about how computer technology should be used. Computers provide us with new capabilities and these in turn give us new choices for action. Often, either no policies for conduct in these situations exist or existing policies seem inadequate. A central task of computer ethics is to determine what we should do in such cases, i.e., to formulate policies to guide our actions. - James Moor, 1985 10
The Belmont Report "Ethical Principles and Guidelines for the Protection of Human Subjects of Research, US Department of Health, Education, and Welfare, April 18,1979 IRBs help ensure that research conforms with the ethical principles of the Belmont Report 11
The Menlo Report "Ethical Principles Guiding Information and Communication Technology Research Supported by US Department of Homeland Security (published 2011). Belmont Principle Respect for Persons Beneficence Menlo Application Identify stakeholders Informed consent Identify potential benefits and harms Balance risks and benefits Mitigate realized harms Justice Additional Menlo Principle: Respect for the Law and Public Interest Fairness and equity Compliance Transparency and accountability 12
Case Studies of ICT Research Shining Light in Dark Places: Understanding the ToR Network Learning More About the Underground Economy: A Case Study of Keyloggers and Dropzones Your Botnet is My Botnet: Examination of a Botnet Takeover Why and How to Perform Fraud Experiments Measurement and Mitigation of Peer-to- Peer-Based Botnets: A Case Study on Storm Worm Spamalytics: An Empirical Analysis of Spam Marketing Conversion Studying Spamming Botnets Using Botlab P2P as Botnet Command and Control: A Deeper Insight DDoS Attacks Against South Korean and U.S. Government Sites BBC: Experiments with Commercial Botnets Lycos Europe Make Love Pacemakers and Not Spam Campaign University of Bonn: Stormfucker Information Warfare Monitor: Ghostnet Tipping Point: Kraken Botnet Takeover Symbiot: Active Defense Tracing Anonymous Packets to the Approximate Source LxLabs Kloxo/HyperVM Exploiting Open Functionality in SMS- Capable Networks Implantable Cardiac Defibrillators: Software Radio Attacks and Zero-Power Defenses Black Ops 2008 -- Its The End Of The Cache As We Know It How to Own the Internet in Your Spare Time Botnet Design RFID Hacking WORM vs. WORM: preliminary study of an active counter-attack mechanism A Pact with the Devil Playing Devil's Advocate: Inferring Sensitive Data from Anonymized Network Traces Protected Repository for the Defense of Infrastructure Against Cyber Attacks Likely to be considered Human Subjects Research subject to IRB review 13
Summary/Way Forward Expanding international cooperation framework to facilitate multinational cyber security research collaboration Complete: Australia, Canada, Israel, Japan, United Kingdom In Progress: Netherlands, New Zealand, Singapore, Spain Technical and Policy work on disclosure control for traffic data Additional Data Access Methods, such as Secure Virtual Enclaves Expansion of Ethics of Cyber Security research to include the development of practical guidelines for research community, including emphasis on Institutional Review Boards (IRBs) Additional datasets and categories Public/restricted and live streaming/archival data Coordinate across USG agencies to share unclassified data resulting from cyber security R&D; currently sharing data produced by: DARPA, FCC, and IARPA We are actively seeking additional datasets for inclusion in PREDICT in addition to encouraging the use of existing datasets 14
Douglas Maughan, Ph.D. Division Director Cyber Security Division Homeland Security Advanced Research Projects Agency (HSARPA) douglas.maughan@dhs.gov 202-254-6145 / 202-360-3170 For more information, visit http://www.dhs.gov/cyber-research https://www.predict.org 15
What are ethics? The field of ethics (or moral philosophy) involves systematizing, defending, and recommending concepts of right and wrong behavior. Normative ethics, is concerned with developing a set of morals or guiding principles intended to influence the conduct of individuals and groups within a population (i.e., a profession, a religion, or society at large). 16
Ethics!= Law Law can be defined as a consistent set of universal rules that are widely published, generally accepted, and usually enforced Interrelated but by no means identical (e.g., legal but not ethical, ethical but not legal) Adherence to ethical principles may be required to meet regulatory requirements surrounding academic research A law may illuminate the line between beneficial acts and harmful ones. If the computer security research community develops ethical principals and standards that are acceptable to the profession and integrates those as standard practice, it makes it easier for legislatures and courts to effectively perform their functions. 17