Data Mining for Digital Forensics



Similar documents
Introduction to Data Mining

Hybrid Model For Intrusion Detection System Chapke Prajkta P., Raut A. B.

Data Mining and Exploration. Data Mining and Exploration: Introduction. Relationships between courses. Overview. Course Introduction

NATIONAL SECURITY CRITICAL MISSION AREAS AND CASE STUDIES

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014

An Introduction to Data Mining

Statistics 215b 11/20/03 D.R. Brillinger. A field in search of a definition a vague concept

A Survey on Intrusion Detection System with Data Mining Techniques

Data Warehousing and Data Mining in Business Applications

Information Management course

A Proposed Data Mining Model to Enhance Counter- Criminal Systems with Application on National Security Crimes

Application of Data Mining Techniques in Intrusion Detection

FEATURE SPECIFIC CRIMINAL MAPPING USING DATA MINING TECHNIQUES AND GENERALIZED GAUSSIUN MIXTURE MODEL

Introduction to Data Mining and Machine Learning Techniques. Iza Moise, Evangelos Pournaras, Dirk Helbing

Lluis Belanche + Alfredo Vellido. Intelligent Data Analysis and Data Mining

AN INTELLIGENT ANALYSIS OF CRIME DATA USING DATA MINING & AUTO CORRELATION MODELS

Mobile Phone APP Software Browsing Behavior using Clustering Analysis

Principles of Dat Da a t Mining Pham Tho Hoan hoanpt@hnue.edu.v hoanpt@hnue.edu. n

Introduction. A. Bellaachia Page: 1

PSG College of Technology, Coimbatore Department of Computer & Information Sciences BSc (CT) G1 & G2 Sixth Semester PROJECT DETAILS.

DATA MINING AND EXPERT SYSTEMS IN LAW ENFORCEMENT AGENCIES

Crime Hotspots Analysis in South Korea: A User-Oriented Approach

Introduction to Data Mining

Data Mining Analytics for Business Intelligence and Decision Support

Data Mining Solutions for the Business Environment

Using Data Mining for Mobile Communication Clustering and Characterization

Data Mining. 1 Introduction 2 Data Mining methods. Alfred Holl Data Mining 1

Dynamic Data in terms of Data Mining Streams

International Journal of Computer Science Trends and Technology (IJCST) Volume 3 Issue 3, May-June 2015

Role of Social Networking in Marketing using Data Mining

An Overview of Knowledge Discovery Database and Data mining Techniques

A Review of Data Mining Techniques

Using reporting and data mining techniques to improve knowledge of subscribers; applications to customer profiling and fraud management

Making critical connections: predictive analytics in government

Fluency With Information Technology CSE100/IMT100

Intrusion Detection via Machine Learning for SCADA System Protection

Data Mining System, Functionalities and Applications: A Radical Review

HYBRID INTRUSION DETECTION FOR CLUSTER BASED WIRELESS SENSOR NETWORK

COPYRIGHTED MATERIAL. Contents. List of Figures. Acknowledgments

A Review of Anomaly Detection Techniques in Network Intrusion Detection System

Data Mining + Business Intelligence. Integration, Design and Implementation

Knowledge Discovery from Data Bases Proposal for a MAP-I UC

Digital Identity & Authentication Directions Biometric Applications Who is doing what? Academia, Industry, Government

Analyzing Huge Data Sets in Forensic Investigations

SPATIAL DATA CLASSIFICATION AND DATA MINING

Healthcare Measurement Analysis Using Data mining Techniques

How To Manage Security On A Networked Computer System

DATA MINING - SELECTED TOPICS

Hunting for the Undefined Threat: Advanced Analytics & Visualization

Three proven methods to achieve a higher ROI from data mining

Dan French Founder & CEO, Consider Solutions

Bisecting K-Means for Clustering Web Log data

Full-Context Forensic Analysis Using the SecureVue Unified Situational Awareness Platform

Data Mining Framework for Direct Marketing: A Case Study of Bank Marketing

Survey of Data Mining Approach using IDS

Principles of Data Mining

Introduction to Data Mining and Business Intelligence Lecture 1/DMBI/IKI83403T/MTI/UI

Making Critical Connections: Predictive Analytics in Government

College information system research based on data mining

Potential Value of Data Mining for Customer Relationship Marketing in the Banking Industry

Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization

ANALYTICS CENTER LEARNING PROGRAM

CAS CS 565, Data Mining

Network Machine Learning Research Group. Intended status: Informational October 19, 2015 Expires: April 21, 2016

INTERNATIONAL JOURNAL FOR ENGINEERING APPLICATIONS AND TECHNOLOGY DATA MINING IN HEALTHCARE SECTOR.

DATA MINING TECHNIQUES AND APPLICATIONS

Profile Based Personalized Web Search and Download Blocker

On A Network Forensics Model For Information Security

Statistics for BIG data

An Evaluation of Machine Learning Method for Intrusion Detection System Using LOF on Jubatus

Course DSS. Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization

Example application (1) Telecommunication. Lecture 1: Data Mining Overview and Process. Example application (2) Health

Credit Card Fraud Detection Using Meta-Learning: Issues 1 and Initial Results

IBM Content Analytics: Rapid insight for crime investigation

Sunnie Chung. Cleveland State University

Machine Learning Log File Analysis

A New Approach for Evaluation of Data Mining Techniques

Chapter 5. Warehousing, Data Acquisition, Data. Visualization

Index Contents Page No. Introduction . Data Mining & Knowledge Discovery

A Case of Study on Hadoop Benchmark Behavior Modeling Using ALOJA-ML

Credit Card Fraud Detection Using Self Organised Map

Detection. Perspective. Network Anomaly. Bhattacharyya. Jugal. A Machine Learning »C) Dhruba Kumar. Kumar KaKta. CRC Press J Taylor & Francis Croup

A Study of Web Log Analysis Using Clustering Techniques

Knowledge Discovery from patents using KMX Text Analytics

An intelligent Analysis of a City Crime Data Using Data Mining

Welcome. Data Mining: Updates in Technologies. Xindong Wu. Colorado School of Mines Golden, Colorado 80401, USA

Digital Forensics and Cyber Crime Datamining

WYNYARD ADVANCED CRIME ANALYTICS POWERFUL SOFTWARE TO PREVENT AND SOLVE CRIME

Introduction to Data Mining Techniques

Transcription:

Digital Forensics - CS489 Sep 15, 2006 Topical Paper Mayuri Shakamuri Data Mining for Digital Forensics Introduction "Data mining is the analysis of (often large) observational data sets to find unsuspected relationships and to summarize the data in novel ways that are both understandable and useful to the data owner" (Hand, Mannila and Smyth 2001). Advancements in storage technology and digital data acquisition have contributed to growth of huge databases. This is happening in many areas from day to day tasks like credit and usage records, telephone call details, and market transactions to more complex ones like image processing, molecular databases, and medical records. The data is being misused as much as it is used for righteous purposes. Our dependency on these databases is increasing; the threat of having disruption due to cyber attacks has become a pressing issue. It has also become important to extract information from these huge databases that might be of value to the owner of the database. Data mining also called Knowledge-Discovery in Databases (KDD) can play a big role in making it convenient and practical to explore very large databases. Digital forensics is application of the scientific method to digital media in order to establish factual information for judicial review. This process often involves investigating computer systems to determine whether they are or have been used for illegal or unauthorized activities. (Wikipedia) With the growing sizes of databases, law enforcement and intelligence agencies face the challenge of analyzing large volumes of data involved in criminal and terrorist activities (Chen et al., 2003). Thus, a suitable scientific method for digital forensics is data mining. Data Mining Techniques Data mining can be categorized into different types of tasks. These tasks depend on the person's objectives in analyzing the data. (Hand, Mannila and Smyth 2001). 1. Exploratory Data Analysis (EDA): In this technique the goal is to explore the data without any idea of what we are looking for. EDA techniques can be interactive and visual. Some applications of EDA techniques are:

Coxcomb plots - In 1858, Florence Nightingale used it to display mortality rates at military hospitals in and near London. Becker, Eick, and Wilks (1995) described a set of intricate spatial displays for visualization of time-varying long-distance telephone network patterns over 12,000 links (Hand, Mannila and Smyth 2001). 2. Descriptive Modeling: This technique's goal is to describe all the data that is being explored. Some examples of such distributions are: Density estimations - Used for probability distributions of data. Cluster analysis and Segmentation - Partition of space into groups. Segmentation has been widely used in marketing to determine demographics. Clustering has been widely used in psychiatric research to determine taxonomies for psychiatric disorders. Dependency modeling - Models describing relationships between groups. 3. Predictive Modeling: In this technique a model can be built that will allow the value of one variable to be predicted from the known values of other variables. Classification and regression are the method used in this modeling. In classification the variable being predicted is categorical, where as in regression the variable is quantitative. Some examples of this modeling are: SKICAT system - used to classify stars from a 40-dimensional feature vector. AT&T Used Regression techniques to build models to estimate the probability whether a phone number is located at a business or residence. 4. Discovering Patterns and Rules: As the name suggests this method's goal is to find patterns in the data set based on association rules using algorithmic techniques. Tracking fraudulent use of cellular telephones 5. Retrieval by Content: The idea behind this method is to find a similar pattern based on the pattern a user has. This method is widely used in text and image data sets. PageRank - Used by Google systems to estimate relative importance of Web pages. QBIC - Developed by IBM to search large image databases using content-based queries.

Applying Data Mining techniques in Digital Forensics Digital forensic professionals, based on the types of data sets and specific nature of information needed, select appropriate data mining techniques. As an example, data can be a huge collection of emails, images and network traffic information etc. Appropriate data mining techniques include support vector machine learning algorithm, behavior based anomaly detection, and heuristic-based anomaly detection. 1. Intrusion Detection Systems Researchers at Columbia University have conceived an approach to intrusion detection systems (IDS) based on data mining of audit sources. Detection models are constructed automatically using cost-sensitive machine learning algorithms using given cost metrics. In cost-sensitive IDS, normal and intrusion activities are analyzed and this information is used in building effective misuse and anomaly detection models. Based on this the system finds the clusters of attack signatures and normal profiles and constructs dynamically configurable group of models (Stolfo et al., 2001). 2. Image Mining The amount of image traffic is growing day by day over the Internet. Illicit images are being transmitted at an alarming rate. Checking every image manually to identify which ones are of interest to digital forensics investigators and law enforcement officers is extremely time consuming and can be unproductive. A need for data mining tool is ever increasing to help investigators find the images in a relatively less time. Researchers at Queensland University together with Defense Science and Technology Organization in Australia have used data mining techniques to design an Image Mining System. "The system can be trained by a hierarchical Support Vector Machine (SVM) to detect objects and scenes which are made up of components under spatial or non-spatial constraints" (Brown et al., 2005). This model allows forensics investigators to communicate with the system via a grammar. "The grammar allows object description for training, searching, querying and relevance feedback (Brown et al., 2005). 3. Criminal Network Analysis In a NSF Digital Government Program funded project called COPLINK (Center: Information and Knowledge Management for Law Enforcement) researchers have applied data mining techniques for analyzing data in the context of law enforcement. One of them was to analyze and recognize previously unknown structural patterns from criminal networks in organized crimes such as

narcotics trafficking, terrorism, gang-related crimes and other illegal activities. Social Network Analysis (SNA) was the data mining technique used for these kinds of networks. There analysis involved four steps: Network extraction, Subgroup detection, Interaction patter discovery and Central member Identification (Chen et al., 2003). For subgroup detection they have used hierarchical clustering to detect subgroups based on relational strength in criminal network. Social network analysis approach called block modeling was used to reveal patterns of between-group interactions. To detect subgroups, interaction patterns and the overall structure manually is a rather difficult task. They concluded that the subgroups and members found based on this approach were correct representations of the reality. 4. Mining E-mail content E-mail is the most commonly used application on the Internet. There has been research on content analysis to perform various tasks such as spam detection and control and automated filing. For digital forensics and law enforcement purposes this may not be sufficient. As e-mail is accepted as legal evidence, there is a growing need for better tools to analyze the content and find patterns and other useful information for digital forensics professionals. Analyzing huge volumes of e-mail data manually can be extremely tedious and at times inefficient and unproductive. Data mining techniques can be applied to build tools that find valuable information and can save critical time that an investigator can spend on other important forensics tasks. Besides the content of the e-mail, information like who sent the e-mail and where it is being sent from and so on can be of great value. Once again in analyzing this information data mining tools can be very useful as they can integrate various aspects into one model. Researchers at Columbia University, New York have developed an E-mail mining toolkit (EMT) that helps law enforcement officers and digital forensics professionals in analyzing the emails and being able to present it as evidence. EMT detects anomalous behavior patterns in an account, similar patterns across accounts, which are a means of detecting proxy accounts used by a person to hide their identity (Stolfo et al., 2005). Their work has shown that with this data mining driven toolkit new behavior models can be used in spam detection. Structural characteristics and linguistic patterns were derived and combined with a Support Vector Machine learning algorithm to mine the e-mail content (Vel et al., 2001)

5. Modeling the Behavior of Serious Sexual Offenders Data mining has been used in many business organizations as well as criminal activities. The capabilities of these techniques are encouraging and are extending to various other areas. Researchers at University of Wolverhampton, along with the Police department of Birmingham, in UK have applied data mining techniques to link crimes of a serious sexual nature (Adderley et al., 2001). They have used Self Organizing Maps (SOM), which is a subtype of artificial neural networks, for this analysis. The data was taken from National Crime Faculty and National Police Staff College Bramshill, UK. A prototype based on behavioral patters was developed that formed clusters and linked offenders to a particular cluster in much shorter time compared to doing it manually. The commercial data-mining package SPSS Clementine was used to facilitate faster development of the model. The SOM technique was used to analyze sexual assaults and rape offences held in a ViCLASS relational database within the National Crime Faculty at Bramsmill (Adderley et al., 2001). This helped them in determining which of the crimes the same offender(s) committed. The analysts established that crimes in individual clusters exhibited strong similarities, with adjacent clusters that are based on a variable theme having similar traits as illustrated (Adderley et al., 2001). Conclusion There are several commercial data mining tools used in various industrial sectors and business. Some of the major players in the data mining sector are Clementine, Darwin, CART Decision Tree Software, MARS Predictive Modeling Software, TreeNet Stochastic Gradient Boosting Software, LOGIT Software, RandomForests, and COGNOS to name a few. Basis Technologies is working on Multilingual Digital Forensics to leverage its analytical multilingual search techniques to enhance the field of digital forensics. These commercially available data mining tools can be used for forensics and there is ongoing research in the quest for the killer applications in data mining. Data mining techniques have unlimited potential in the field of forensic science where models and tools can be developed to help investigators, digital forensics professionals and law enforcement officers to find the data or clues they are searching for much more efficiently and faster.

References: 1. Hand, D., Mannila, H., Smyth, P., (2001). Principles of Data Mining. Cambridge, MA: MIT Press 2. Chen, H., Chung, W., Qin, Y., Chau, M., Xu, J. J., Wang, G., Zheng, R., Atabakhsh, H. (2003). Crime Data Mining: An Overview and Case Studies. ACM International Conference Proceeding Series; Vol. 130, 1-5. 3. Stolfo, S. J., Lee, W., Chan, P. K., Fan, W., Eskin. E. (2001). Data Mining-based Intrusion Detectors: An Overview of the Columbia IDS Project. ACM SIGMOD Record; Vol. 30, 5-14. 4. Brown, B., Pham, B., Vel, O. (2005). Design ofa Digital Forensics Image Mining System. IIHMSP05, Melbourne 5. Vel, O., Anderson, A., Coney, M., Mohay. G. (2001). Mining E-mail Content for Author Identification Forensics. ACM SIGMOD Record; Vol. 30, No. 4. 6. Stolfo, S. J., Hershkop, S. (2005). Email mining toolkit supporting law enforcement forensic analyses. ACM International Conference Proceeding Series; Vol. 89, 221-222. 7. Adderley, R., Musgrove, P. B. (2001). Data mining case study: Modeling the behavior of offenders who commit serious sexual assaults. Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining; 215-220.