Analyzing Huge Data Sets in Forensic Investigations



Similar documents
Chapter ML:XI. XI. Cluster Analysis

Augmented Search for Web Applications. New frontier in big log data analysis and application intelligence

Database Marketing, Business Intelligence and Knowledge Discovery

CRIMINAL JOURNEY MAPPING

Introduction to Data Forensics. Jeff Flaig, Security Consultant January 15, 2014

locuz.com Big Data Services

Data Mining Solutions for the Business Environment

Building a Database to Predict Customer Needs

Introduction. A. Bellaachia Page: 1

Real World Application and Usage of IBM Advanced Analytics Technology

On A Network Forensics Model For Information Security

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014

Ensuring Security in Cloud with Multi-Level IDS and Log Management System

The Business Case for ECA

Design and Implementation of a Live-analysis Digital Forensic System

SPATIAL DATA CLASSIFICATION AND DATA MINING

PTK Forensics. Dario Forte, Founder and Ceo DFLabs. The Sleuth Kit and Open Source Digital Forensics Conference

A STUDY ON DATA MINING INVESTIGATING ITS METHODS, APPROACHES AND APPLICATIONS

Big Data Mining Services and Knowledge Discovery Applications on Clouds

An Overview of Knowledge Discovery Database and Data mining Techniques

Virtualization Forensics: Acquisition and analysis of a clustered VMware ESXi servers

NUIX WHITE PAPER THE INVESTIGATIVE LAB: A MODEL FOR EFFICIENT COLLABORATIVE DIGITAL INVESTIGATIONS WHITE PAPER

Cleaned Data. Recommendations

Incident Response and Computer Forensics

The Role of Digital Forensics within a Corporate Organization

Journal of Global Research in Computer Science RESEARCH SUPPORT SYSTEMS AS AN EFFECTIVE WEB BASED INFORMATION SYSTEM

Data Mining System, Functionalities and Applications: A Radical Review

Enhancing Forensic Investigation in Large Capacity Storage Devices using WEKA: A Data Mining Tool

not possible or was possible at a high cost for collecting the data.

Web Data Mining: A Case Study. Abstract. Introduction

A Survey on Web Mining From Web Server Log

WebFOCUS RStat. RStat. Predict the Future and Make Effective Decisions Today. WebFOCUS RStat

Information Technologies and Fraud

EC-Council Ethical Hacking and Countermeasures

How To Use Neural Networks In Data Mining

Automatic Timeline Construction For Computer Forensics Purposes

Research of Postal Data mining system based on big data

Digital Forensics: The aftermath of hacking attacks. AHK Committee Meeting April 19 th, 2015 Eng. Jamal Abdulhaq Logos Networking FZ LLC

Computer Forensics as an Integral Component of the Information Security Enterprise

A Systemic Artificial Intelligence (AI) Approach to Difficult Text Analytics Tasks

CONCEPT MAPPING FOR DIGITAL FORENSIC INVESTIGATIONS

Digital Forensic. A newsletter for IT Professionals. I. Background of Digital Forensic. Definition of Digital Forensic

Hexaware E-book on Predictive Analytics

Introduction to Data Mining and Machine Learning Techniques. Iza Moise, Evangelos Pournaras, Dirk Helbing

Installing and Configuring Windows Server Module Overview 14/05/2013. Lesson 1: Planning Windows Server 2008 Installation.

Adding New Level in KDD to Make the Web Usage Mining More Efficient. Abstract. 1. Introduction [1]. 1/10

Data Mining for Digital Forensics

A Proposed Data Mining Model to Enhance Counter- Criminal Systems with Application on National Security Crimes

FREQUENT PATTERN MINING FOR EFFICIENT LIBRARY MANAGEMENT

Data Mining Application for Cyber Credit-card Fraud Detection System

An overview of IT Security Forensics

Data Mining. 1 Introduction 2 Data Mining methods. Alfred Holl Data Mining 1

Big Data. Fast Forward. Putting data to productive use

DATA MINING TECHNIQUES AND APPLICATIONS

2) Xen Hypervisor 3) UEC

Data Mining Techniques

A STUDY ON HADOOP ARCHITECTURE FOR BIG DATA ANALYTICS

TIETS34 Seminar: Data Mining on Biometric identification

Design and Implementation of Digital Forensics Labs:

Investigating the prevalence of unsecured financial, health and personally identifiable information in corporate data

Using Artificial Intelligence to Manage Big Data for Litigation

Healthcare Measurement Analysis Using Data mining Techniques

SIMPLE MACHINE HEURISTIC INTELLIGENT AGENT FRAMEWORK

Thanks to SECNOLOGY s wide range and easy to use technology, it doesn t take long for clients to benefit from the vast range of functionality.

BIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON

On the features and challenges of security and privacy in distributed internet of things. C. Anurag Varma CpE /24/2016

Digital Forensic Techniques

Using big data analytics to identify malicious content: a case study on spam s

Big Data with Rough Set Using Map- Reduce

Digital Evidence Search Kit

Example application (1) Telecommunication. Lecture 1: Data Mining Overview and Process. Example application (2) Health

POWERFUL SOFTWARE. FIGHTING HIGH CONSEQUENCE CYBER CRIME. KEY SOLUTION HIGHLIGHTS

LEVERAGING BIG DATA ANALYTICS TO REDUCE SECURITY INCIDENTS A use case in Finance Sector

Comparative Analysis of Free IT Monitoring Platforms. Review of SolarWinds, CA Technologies, and Nagios IT monitoring platforms

Fight fire with fire when protecting sensitive data

Clavister InSight TM. Protecting Values

International Journal of Advanced Engineering Research and Applications (IJAERA) ISSN: Vol. 1, Issue 6, October Big Data and Hadoop

Chapter 3: Data Mining Driven Learning Apprentice System for Medical Billing Compliance

Using Data Mining for Mobile Communication Clustering and Characterization

International Journal of Scientific & Engineering Research, Volume 5, Issue 4, April ISSN

How to Reduce Web Vulnerability Scanning Times

Mining Online GIS for Crime Rate and Models based on Frequent Pattern Analysis

Company & Solution Profile

2015 Workshops for Professors

Making critical connections: predictive analytics in government

Full-Context Forensic Analysis Using the SecureVue Unified Situational Awareness Platform

Transforming the Telecoms Business using Big Data and Analytics

Best Practices for Managing Virtualized Environments

Transcription:

Analyzing Huge Data Sets in Forensic Investigations Kasun De Zoysa Yasantha Hettiarachi Department of Communication and Media Technologies University of Colombo School of Computing Colombo, Sri Lanka

Centre for Digital Forensic ISIF Information Society Innovation Grant www.isif.asia

Our Role Police CID Customs Bribery and Corruption Judicial Services Victims

Year vs Number of Crimes Number of Crimes Reported During Past 7 Years 25 No of Crimes 20 15 10 5 0 2003 2004 2005 2006 Year 2007 2008 2009 4

5

Problems Faced Evidence not being collected in an acceptable manner Evidence being damaged due to time and environmental factors Evidence being damaged (wiped/formatted) before collection

Why? Equipments are not available Software are not available Procedures and policies are not in place Lack of IT knowledge in the Law Enforcement Sector

Some Existing Popular Forensic Investigation Tools Tools Description Encase/FTK Commercial products Sleuthkit - Open source -Widely used tool -Provide tools for forensic activities -Easy to understand and deploy PyFlag -Not widely used -Complex -Difficult to deploy PTK, Autopsy -Consumes a lot of time during file analysis

Challenges of Developing a Forensic Toolkit for a Developing Country Limited Resources Lack of high end machines Appropriate media to store evidence Procedures and Policies Developing a forensic framework -> accept balance between the technology and law Poor IT Literacy of Police and Legal Officers User friendly and useful service to the courts and judges

FIT4D A software toolkit utilizes the limited resources in developing countries http://score.ucsc.lk/fit4d/

Comparison Between PTK and FIT4D Features Feature PTK FIT4D 1 Creating disk images 2 Searching /filtering the disk image 3 Analysis and searching disk image piece wise 4 Report generation 5 Graphics processing tools 6 Compare file content within the image 7 Attach legal documents such as court orders to the case 8 Evidence not stored in a central server 9 Dynamic Timeline 10 Multiple investigators and case lock

Storage Capacity Grows Over Time Source : Wikipedia Tremendous time and effort in forensic investigations for analyzing huge data sets.

There are Huge Number of Hard Disks Which contains the email address perera@gmail.com? Which belongs to Mr. G.H. Perera?

Today most of the forensic tools analyze single drive at a time These tools are not adequate today s forensic challenge

Existing Tools Inefficient Most of the existing investigation tools cannot handle these huge data sets in an efficient manner. E.g: it will take nearly two/three hours to open a 6GB hard disk using a popular forensic toolkit like FTK

Data Mining : A Better Solution? Data mining is a good solution to handle massive volumes of data. Little research has focused on applying data mining techniques to digital forensics!

Proposed System : Data Mining for Forensic Investigations Our aim is to build a system which applies data mining techniques forensic analysis of data. Provide some pre-categorization of data and intelligent analysis

Advantages : Proposed System It will free the investigator from all low level and manual tasks. This will speed-up the investigation process Will improve the quality of the information associated with the data analysis. Reduce the huge monetary cost associate with a digital investigation.

Proposed System Architecture Evidence correlation Engine Entity Extraction Engine Clustering Engine Association Rule Mining Engine Data Store Transform Data Data selection and Cleaning Sleuthkit Extract Disk Information Disk Images

Entity Extraction Extract information in Unstructured documents into categories such as the names of persons, organizations, locations, expressions of times, quantities, monetary values, percentages, e-mail addresses, authorships, personal characteristics etc. There are open source software for named entity extraction : GATE, ANNIE.

Clustering and Categorizing Data Classify data according to the patterns found on the storage medium E.g : Mine e-mail content and identify its authorship from a set of examples from known authors.

Association Rule Mining Find frequently occurring patterns in data sets and present them as rules E.g : This technique has been applied to network intrusion detection to derive association rules from user s interaction history. Those extracted rules can be used to discover future network attacks

Correlation of evidence Investigator has to browse and search for evidence and finally correlating all evidences to make final conclusions. Connecting the Dots operation is very complex Data mining statistical and intelligent methods to find correlations between the information found on the evidence. E.g : FACE is an example for a framework for automatic evidence discovery and correlation from a variety of forensic targets. They have only used it for memory evidence correlation.

The Proposed Framework will.. Apply data mining and artificial intelligence concepts to facilitate digital forensic. Release the investigator from all the low level tasks that they currently have to do. If applied properly, the system will achieve 3 main goals. 1) It will speed-up the investigation process and reduces the time taken for a digital investigation. 2) It will improve the quality of the information associated with the data analysis. 3) It will reduce the huge monetary cost associate with a digital investigation.

Limitations Although data mining has applied successfully in various domains, it is not much used and tested within the domain of digital forensic. Data mining and AI techniques need huge data sets for training the system. Otherwise it will show poor performance. We believe that these limitations will not limit the potential of extending data mining research to digital forensic and digital investigations.

Conclusion We propose a digital forensic investigation framework which would be able to free the investigator from all the low level tasks that they currently have to do. This will speed-up the investigation process and reduces the time taken for a digital investigation. Improve the quality of the information associated with the data analysis. Reduce the huge monetary cost associate with a digital investigation. We encourage other researchers and practitioners to assist us in improving awareness and skills in this area.

Thank you Contact Kasun (kasun@ucsc.cmb.ac.lk) to get more information about our projects