BIG DATA & Forensics. Katrin Franke, PhD Norwegian Information Security Laboratory, Gjøvik, Norway



Similar documents
Digital Forensics: Current and Future Needs

Workshop on Building international cooperation WG2 : Network Information Security / cyber security

Concept and Project Objectives

Cyber Forensic for Hadoop based Cloud System

Information Security Basic Concepts

Cyber Resilience Implementing the Right Strategy. Grant Brown Security specialist,

Volume 3, Issue 6, June 2015 International Journal of Advance Research in Computer Science and Management Studies

Bellevue University Cybersecurity Programs & Courses

Detection. Perspective. Network Anomaly. Bhattacharyya. Jugal. A Machine Learning »C) Dhruba Kumar. Kumar KaKta. CRC Press J Taylor & Francis Croup

INF 3510 INFORMATION SECURITY Guest on Digital Forensics April André Årnes, PhD

A Review of Anomaly Detection Techniques in Network Intrusion Detection System

The Cyber Threat Profiler

ASSUMING A STATE OF COMPROMISE: EFFECTIVE DETECTION OF SECURITY BREACHES

Exploiting Data at Rest and Data in Motion with a Big Data Platform

Digital Identity & Authentication Directions Biometric Applications Who is doing what? Academia, Industry, Government

MEng, BSc Computer Science with Artificial Intelligence

MEng, BSc Applied Computer Science

KEITH LEHNERT AND ERIC FRIEDRICH

Şule Yildirim Yayilgan, PhD, Associate Professor, NISLAB NBL uley/

Course Bachelor of Information Technology majoring in Network Security or Data Infrastructure Engineering

Implementing Digital Forensic Readiness for Cloud Computing Using Performance Monitoring Tools

COMP9321 Web Application Engineering

Cognitive and Organizational Challenges of Big Data in Cyber Defense

Data quality in Accounting Information Systems

Faculty of Organizational Sciences

CSN08101 Digital Forensics. Module Leader: Dr Gordon Russell Lecturers: Robert Ludwiniak

This Symposium brought to you by

Tax Fraud in Increasing

DIGITAL FORENSICS SPECIALIZATION IN BACHELOR OF SCIENCE IN COMPUTING SCIENCE PROGRAM

Addressing Cyber Risk Building robust cyber governance

Big Data and Analytics: Challenges and Opportunities

Masters in Information Technology

Information Management course

Healthcare Measurement Analysis Using Data mining Techniques

A STUDY ON DATA MINING INVESTIGATING ITS METHODS, APPROACHES AND APPLICATIONS

How to use Big Data in Industry 4.0 implementations. LAURI ILISON, PhD Head of Big Data and Machine Learning

Introduction. A. Bellaachia Page: 1

An Evaluation of Machine Learning Method for Intrusion Detection System Using LOF on Jubatus

NISlab - Norwegian Information Security laboratory

BAE SYSTEMS CYBERREVEAL G-CLOUD SERVICE DEFINITION

NSF Workshop on Big Data Security and Privacy

Big Data Analytics. An Introduction. Oliver Fuchsberger University of Paderborn 2014

Research Topics in the National Cyber Security Research Agenda

Fostering Incident Response and Digital Forensics Research

Business Intelligence meets Big Data: An Overview on Security and Privacy

Certified Cyber Security Analyst VS-1160

Improving Data Processing Speed in Big Data Analytics Using. HDFS Method

Big Data & Analytics: Your concise guide (note the irony) Wednesday 27th November 2013

BIG DATA What it is and how to use?

Information Technology Engineers Examination. Information Security Specialist Examination. (Level 4) Syllabus

1. Understanding Big Data

An Approach to Understand the End User Behavior through Log Analysis

A new Approach for Intrusion Detection in Computer Networks Using Data Mining Technique

Survey on Different Phases of Digital Forensics Investigation Models

Faculty of Organizational Sciences

International Journal of Engineering Research ISSN: & Management Technology November-2015 Volume 2, Issue-6

Workshop on Hadoop with Big Data

Cyber DTU. Lars Ramkilde Knudsen

Intrusion Detection via Machine Learning for SCADA System Protection

ANALYTICS STRATEGY: creating a roadmap for success

How To Create A Text Classification System For Spam Filtering

Analyzing HTTP/HTTPS Traffic Logs

The Impact of Cybercrime on Business

ATTPS Publication: Trustworthy ICT Taxonomy

Big Data Classification: Problems and Challenges in Network Intrusion Prediction with Machine Learning

CONTINUOUS DIAGNOSTICS BEGINS WITH REDSEAL

Introduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data

Zak Khan Director, Advanced Cyber Defence


Secure Because Math: Understanding ML- based Security Products (#SecureBecauseMath)

Masters in Human Computer Interaction

Cyber Security. BDS PhantomWorks. Boeing Energy. Copyright 2011 Boeing. All rights reserved.

CLASSIFYING NETWORK TRAFFIC IN THE BIG DATA ERA

ICT SECURITY SECURE ICT SYSTEMS OF THE FUTURE

Fight fire with fire when protecting sensitive data

Strengthening Forensic Science in the United States: A Path Forward

How To Use Neural Networks In Data Mining

ENHANCING INTELLIGENCE SUCCESS: DATA CHARACTERIZATION Francine Forney, Senior Management Consultant, Fuel Consulting, LLC May 2013

Cloud Forensics: an Overview. Keyun Ruan Center for Cyber Crime Investigation University College Dublin

European Archival Records and Knowledge Preservation Database Archiving in the E-ARK Project

1. Programme title and designation Advanced Software Engineering

New trend in Russian informatics curricula: integration of math and informatics

Certifications and Standards in Academia. Dr. Jane LeClair, Chief Operating Officer National Cybersecurity Institute

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014

WYNYARD ADVANCED CRIME ANALYTICS POWERFUL SOFTWARE TO PREVENT AND SOLVE CRIME

BREAKING THE KILL CHAIN AN EARLY WARNING SYSTEM FOR ADVANCED THREAT

Data Refinery with Big Data Aspects

CONCEPT MAPPING FOR DIGITAL FORENSIC INVESTIGATIONS

Demystifying Big Data Government Agencies & The Big Data Phenomenon

Transcription:

BIG DATA & Forensics Katrin Franke, PhD Norwegian Information Security Laboratory, Gjøvik, Norway 1

Computational Forensics Katrin Franke, PhD Norwegian Information Security Laboratory, Gjøvik, Norway 2

Katrin Franke Professor of Computer Science, 2010 PhD in Artificial Intelligence, 2005 MSc in Electrical Engineering, 1994 Industrial Research and Development (20+ years) Financial Services and Law Enforcement Agencies Courses, Tutorials and post-graduate Training: Police, BSc, MSc, PhD Funding Chair IAPR/TC6 Computational Forensics IAPR * Young Investigator Award, 2009 * International Association of Pattern Recognition 3 kyfranke.com

NISlab @ Gjøvik University College 52 Persons 12 Permanent staff, 15 Part timers, 2 Post docs, 20 Ph.D. students, and 3 Administrative staff 4 Study programs B.Sc. (40), M.Sc. (80) and Ph.D. in Information Security B.Sc. (30) in Network and System Administration 1 National Research School of Computer and Information Security COINS..Oslo Gjøvik University College Externally funded projects NFR, EU FP7, NIST 2 Focus laboratories BiometricsLab og TestimonLab 4 NISlab TM

Joint Forces: Center for Cyber & Information Security Norwegian National Security Authority (NSM), Directorate of Police (Politidirektoratet), National Criminal Investigation Service (Kripos), National Police Computing and Material Service (PDMT) Norwegian National Authority for Investigation and Prosecution of Economic and Environmental Crime (Økokrim), Norwegian Police Security Service (PST), Police Academy (Politihøgskolen), National ID Centre (NID), Norwegian Cyber Force (Cyberforsvaret), Norwegian Defence Research Establishment (FFI), Telenor, Pricewaterhouse Coopers (PwC), Statkraft, Statnett and Eidsiva, Oppland County.! Publicly announced: 11. June 2013 5

NISlab Working Areas Biometrics User Authentication BTA Protocol Forensics Forensic Readiness Incidence Response Investigation/Analysis Security Management Risk-based Design Security Economics System/Adversary Modeling Human Factors, Policies Security Technology Software Security System Administration Network and Critical Infrastructure Protection 6 Testimon (lat. evidence) Computational & Digital Forensics: Fraud Detection, Analysis and Prevention NISlab TM

Underlying thoughts "Without deviation from the norm, progress is not possible." [Frank Zappa] "Trust is good control is better" [Lenin] "Freedom is the state in which an individual can not be exposed to despotism of others" [Anonymous] "Tell me and I forget, teach me and I may remember, involve me and I learn." [Benjamin Franklin] NISlab TM 7

Internet Adoption 247 billion email per day 234 million websites 5 billion mobile-phone user 50 billion smart things with sensing and communication capabilities that collect data BIG Data Phenomenon Volume, Velocity, Variety NISlab TM

9

Cyber Crime 10

Cyber Crime Offenses & Costs Report of the Belgian Economic and Financial Crimes Division (DJF) Online crime complaints and dollar loss in the United States (IC3, 2010) European Commission, Directorate-General Home Affairs, Directorate Internal Security Unit A.2: Organised Crime! RAND Corporation, Feasibility Study for a European Cybercrime Centre, Technical Report 1218," 2012. Prepared for the EC German Annual Federal Criminal Police Office Situation Report on Cybercrime 2009 and 2010 11

Alkaabi, A., G. M. Mohay, A. J. McCullagh and A. N. Chantler (2010). "Dealing with the problem of cybercrime", Conference Proceedings of 2nd International ICST Conference on Digital Forensics & Cyber Crime, 4 6 October 2010, Abu Dhabi. Types of Cyber Crime NISlab TM 12

Forensic Science Forensic methods consist of multi-disciplinary approaches to perform the following tasks: Investigate and to Reconstruct a crime scene or a scene of an accident, Collect and Analyze trace evidence found, Identify, Classify, Quantify, Individualize persons, objects, processes, Establish linkages, associations and reconstructions, and Use those findings in the prosecution or the defense in a court of law.! So far, mostly dealt with previously committed crime, greater focus is now to prevent future crime. 13

Challenges & Demands in Forensic Investigations Challenges Tiny Pieces of Evidence are hidden in a mostly Chaotic Environment, Trace Study to reveal Specific Properties, Traces found will be Never Identical, Reasoning and Deduction have to be performed on the basis of Partial Knowledge, Approximations, Uncertainties and Conjectures. Demands Objective Measurement and Classification, Robustness and Reproducibility, Secure against Falsifications.! NISlab TM 14

Strengthening Forensic Science in the United States: A Path Forward Committee on Identifying the Needs of the Forensic Sciences Community, National Research Council ISBN: 0-309-13131-6, 352 pages, 6 x 9, (2009) This PDF is available from the National Academies Press at: http:// www.nap.edu/catalog/12589.html NISlab TM

Cyber Crime and Forensics Knowledge and intuition of the human plays a central role in daily casework. Courtroom forensic testimony is often criticized by defense lawyers as lacking a scientific basis. Evidence increasingly data intensive and widely distributed Common practice to seize all data carriers; amounts to many terabytes of data Enrich with data available on the Internet, Social networks, etc. Huge amount of data, tide operational times, and data linkage pose challenges Implement Legal Framework and Standards Add Efficiency and Intelligence to Investigations Computational Forensics, aka applying Artificial Intelligence in Forensic Sciences NISlab TM 16

Computational Forensics - Objectives Study and development of computational methods to Assist in basic and applied research, e.g. to establish or prove the scientific basis of a particular investigative procedure, Support the forensic examiner in their daily casework. Modern crime investigation shall profit from the hybrid-intelligence of humans and machines.

Computational Forensics - Definition It is understood as the hypothesis-driven investigation of a specific forensic problem using computers, with the primary goal of discovery and advancement of forensic knowledge.! CF works towards: 1. In-depth Understanding of a forensic discipline, 2. Evaluation of a particular scientific method basis and 3. Systematic Approach to forensic sciences by applying techniques of computer science, applied mathematics and statistics.! It involves Modeling and computer Simulation (Synthesis) and/or computer-based Analysis and Recognition

Computational vs. Computer (Digital) Forensics Computational Forensics uses computational sciences to study any type of evidence: Computer forensics Crime Scene Investigation Forensic paleography Forensic anthropology Forensic chemistry! Computer Forensics studies digital evidence: File-system forensics Live-system forensics Mobile-device forensics etc.

Requirement of Forensic-Computing Infrastructure KEY FEATURES security - scalability - flexibility Cell-Level Security (being one element of the ecosystems end2end trust assurance framework) Unprecedented Scale (tens of PBs) Multi-Structured Data Analytics Automated Indexing @Ingest msec. Ingest rates 3 in1 database: Column, Document & Graph Store Statistics, SQL plus Full-Text & Graph Search NISlab TM 20

WANDA Architecture

Plug-In Concept

Testimon FDS 3 :(Forensic-Data Store & Secure Services) FDS 3 end2end TA Encryption at Rest Encryption-in-Motion e2eta Audit Policy & Labeling Engines e2eta IdM Integration Data Structures Documents (JSON) ReLIfE Languages FDS 3 CORE Analytics Graphs Thrift FDS 3 Iterators Interfaces Processing D3 Demos Indexing Tools FDS 3 Data Loaders Flume FDS 3 Ingest Lucene PDS-QL MapReduce Connector Pig Connector Apache Accumulo Advanced Analytics Hadoop Distributed File System (HDFS) Commodity Hardware Private Cloud Public Cloud

Requirement of Adapted Computational Methods Proactive, Ultra-large scale Forensic Investigations: Computational Forensics Situation-aware methods Quantified, measurable indicators Adaptive, self-organizing models Distributed, cooperative, autonomous Brain NN FL Imprecision, Uncertainty, Partial Truth EC Natural Evolution Reasoning Computational Intelligence NN: Neuronal Networks FL: Fuzzy Logic EC: Evolutionary Computation 24

Data-driven Approaches BIG DATA Analytics Inter-relation of feature complexity and expected recognition accuracy. (Franke 2005) 25

Application Example: Network Intrusion Detection 10% of the overall (5 millions of instances) KDD CUP 99 test data set for Intrusion Detection; Systems, which have normal traffic and 4 attack classes (DoS, Probe, U2R, R2L). Consider 4 data subsets of the KDD CUP 99: Data Set Number of Instances Normal & DoS 488.736 Normal & Probe 138.391 " " Normal & U2R 97.33 Normal & R2L 98.404 Feature selection: Opt-CFS & Opt-mRMR C4.5 Classifier & Bayesian Network Number of Selected Features Reference: Nguyen, Franke, Petrovic (2009-2012) Achieved Recognition Performance 26

Towards a Generic Feature-Selection Measure for Intrusion Detection Hai Thanh Nguyen, Katrin Franke and Slobodan Petrović Norwegian Information Security Laboratory (NISlab) Gjøvik University College www.nislab.no

Model for Pattern Recognition Test pattern Classification Training Preprocessing Feature Measurement Classification Training pattern Preprocessing Feature Extraction / Selection Learning

Feature Selection Methods Wrapper Methods Filter Methods Correlation Feature Selection (CFS) measure Minimal-Redundancy-Maximal-Relevance (mrmr) measure Generic Feature Selection measure (GeFS) Embedded Methods

Motivation A lot of popular algorithms are not principled and it is difficult to understand what problem they seek to solve and how optimally they solve it. Isabelle Guyon, 2005 Many feature selection algorithms and that perform well in many applications, yet should we delay to create new ones or try to get better understandings, e.g. regarding: The ability of generalization of feature selection measures. The impact of feature selection methods, such as filter methods, to the accuracy of classifiers. Need of more effective procedures that ensures the globally feature subsets.

Our Research Focus 1. Generalization of several feature selection measures. 2. Optimization to derive globally optimal feature subsets. Considering the CFS measure (Hall, 1999) and the mrmr measure (Peng, 2005) for intrusion detection because: Filter methods are usually used to select features from highdimensional data sets, such as intrusion detection systems. Relevance of features and relationship between features are considered The relevance and relationship are usually characterized in terms of correlation (CFS) or mutual information (mrmr).

GFS and mrmr Feature Selection Correlation featureselection (CFS) measure " Class-feature correlation Feature-feature correlation Feature-selection measure based on mutual information (mrmr) Class-feature mutual inform. Feature-feature mutual inform. M. Hall. Correlation Based Feature Selection for Machine Learning. Doctoral Dissertation, University of Waikato, Department of Comp. Science, 1999. H. Peng, F. Long, and C. Ding. Feature selection based on mutual information: criteria of max-dependency, max-relevance, and minredundancy. IEEE Transactions on PAMI, Vol. 27, No. 8, pp.1226-1238, 2005.

Generic Feature Selection (GeFS) Question: Can the CFS measure and the mrmr measures be fused and generalized into a generic feature selection measure? Definition 1: A generic feature selection (GeFS) measure is defined as follows: " " " Proposition 1: The CFS and the mrmr measures are instances of the GeFS measure. Proposition 2: The feature selection by means of the GeFS measure is a polynomial mixed 0-1 fractional programming (PM01FP) problem.

Problem Transformation Chang s method for solving PM01FP Linearizing PM01FP problem into mixed 0-1 linear programming problem (M01LP). The number of variables & constraints: n 2 " Branch and Bound algorithm. Our method for solving PM01FP Differently linearizing PM01FP problem into mixed 0-1linear programming problem (M01LP). The number of variables & constraints: 4n+1 " Branch and Bound algorithm. C-T. Chang. On the polynomial mixed 0-1 fractional programming problems, European Journal of Operational Research, vol. 131, issue 1, pages 224-227, 2001.

Application Example: Malicious Code Detection Static analysis System artifacts Dynamic analysis Debugging Analyzing malicious content PDFs JavaScripts Office documents Shell code Network traffic " Behavioral Malware Analysis (dynamic) via Information-based Dependency Matching : 98.4% Detection rate " Malicious PDF detection Data set: 7,454 unique benign, 16,296 unique malicious PDF: 97.7% Detection rate Reference: Sand, Kittilsen, Franke (2011-2013) 35

Application Example: Author Identification from Text-based Communications Determining authorship of an anonymous text Enron dataset: real emails of Enron employees, contains 255,636 email 87,474 authors. Reference: Chitrakar, Franke (2011-2013) 36

Demand: Automatization, Standardization, and Benchmarking Increase Efficiency and Effectiveness Perform Method / Tool Testing regarding their Strengths/Weaknesses and their Likelihood Ratio Gather, manage and extrapolate data, and to synthesize new Data Sets on demand. Establish and implement Standards for data, work procedures and journal processes Fulfillment of Daubert Standard http://en.wikipedia.org/wiki/daubert_standard 37

Demand: Joint Research & Development Education and training, Revealing the state-of-the art in *each* domain Sources of information on events, activities and financing opportunities International forum to peer-review and exchange, e.g., IWCF workshops Performance evaluation, benchmarking, proof and standardization of algorithms Resources in forms of data sets, software tools, and specifications e.g. data formats 38

Demand: Legal Framework Law as framework for ICT Evidence acquisition and storage Culture, social behaviors, privacy aspects Cross-jurisdiction cooperation, European / International cyberlaw Law as content of ICT Automation, programming of legal rules Methods for dimensionality reduction loss of relevant information Questions on extracted numerical parameters loss of information due to inappropriate features Reliability of applied computational method / tool Dealing with final conclusion based on wrong computational results 39

Perspectives on Forensics & Digital Evidence Legal / Regulations Technological / Security / Archival Knowledge / Capacity Building / Training Public Awareness (pedagogical methods) Organizational / Information Management / Procedures

Cloud Forensics - Vision Reactive Proactive Discrete Event Continuos Monitoring Accountability/Auditing Forensic Readiness Forensic Readiness Retro-fitted Forensics-by-Design "

Concluding Remarks "It is better to know some of the questions than all of the answers." [James Thurber] "We all do better when we work together. Our differences do matter, but our common humanity matters more." [Bill Clinton] "You are never given a wish without also being given the power to make it come true." [R. Bach] NISlab TM 42