Crime Pattern Analysis, Visualization And Prediction Using Data Mining 1

Similar documents
A Regression Approach for Forecasting Vendor Revenue in Telecommunication Industries

An intelligent Analysis of a City Crime Data Using Data Mining

Algorithmic Crime Prediction Model Based on the Analysis of Crime Clusters

Use of Data Mining Techniques to Improve the Effectiveness of Sales and Marketing

An Enhanced Algorithm to Predict a Future Crime using Data Mining

An Overview of Knowledge Discovery Database and Data mining Techniques

Simple linear regression

SPATIAL DATA CLASSIFICATION AND DATA MINING

ASSOCIATION RULE MINING ON WEB LOGS FOR EXTRACTING INTERESTING PATTERNS THROUGH WEKA TOOL

Social Media Mining. Data Mining Essentials

Keywords Big Data; OODBMS; RDBMS; hadoop; EDM; learning analytics, data abundance.

Data Mining for Manufacturing: Preventive Maintenance, Failure Prediction, Quality Control

A STUDY ON DATA MINING INVESTIGATING ITS METHODS, APPROACHES AND APPLICATIONS

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014

Data Mining Solutions for the Business Environment

not possible or was possible at a high cost for collecting the data.

Predictive Analytics Tools and Techniques

Regression model approach to predict missing values in the Excel sheet databases

Smart Security by Predicting Future Crime with GIS and LBS Technology on Mobile Device

Cleaned Data. Recommendations

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression

Predicting the Risk of Heart Attacks using Neural Network and Decision Tree

Bisecting K-Means for Clustering Web Log data

Predictive Analytics on Student Academics [1]

PERFORMANCE ANALYSIS OF CLUSTERING ALGORITHMS IN DATA MINING IN WEKA

Module 3: Correlation and Covariance

The importance of graphing the data: Anscombe s regression examples

Comparison of K-means and Backpropagation Data Mining Algorithms

Financial Trading System using Combination of Textual and Numerical Data

Univariate Regression

Data Mining Applications in Higher Education

Predict the Popularity of YouTube Videos Using Early View Data

with functions, expressions and equations which follow in units 3 and 4.

Chapter 10. Key Ideas Correlation, Correlation Coefficient (r),

Impelling Heart Attack Prediction System using Data Mining and Artificial Neural Network

Robust Outlier Detection Technique in Data Mining: A Univariate Approach

Data Mining Governance for Service Oriented Architecture

A Proposed Data Mining Model to Enhance Counter- Criminal Systems with Application on National Security Crimes

Course Objective This course is designed to give you a basic understanding of how to run regressions in SPSS.

Data Mining in Telecommunication

Homework 11. Part 1. Name: Score: / null

FOREX TRADING PREDICTION USING LINEAR REGRESSION LINE, ARTIFICIAL NEURAL NETWORK AND DYNAMIC TIME WARPING ALGORITHMS

4. Simple regression. QBUS6840 Predictive Analytics.

IT services for analyses of various data samples

ISSN: A Review: Image Retrieval Using Web Multimedia Mining

Role of Social Networking in Marketing using Data Mining

A survey on Data Mining based Intrusion Detection Systems

Data Mining Project Report. Document Clustering. Meryem Uzun-Per

A Review of Data Mining Techniques

Web Usage Mining: Identification of Trends Followed by the user through Neural Network

Correlation key concepts:

A QoS-Aware Web Service Selection Based on Clustering

Manjeet Kaur Bhullar, Kiranbir Kaur Department of CSE, GNDU, Amritsar, Punjab, India

BIG DATA IN HEALTHCARE THE NEXT FRONTIER

Dimensionality Reduction: Principal Components Analysis

FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT MINING SYSTEM

EM Clustering Approach for Multi-Dimensional Analysis of Big Data Set

Research on Clustering Analysis of Big Data Yuan Yuanming 1, 2, a, Wu Chanle 1, 2

What does the number m in y = mx + b measure? To find out, suppose (x 1, y 1 ) and (x 2, y 2 ) are two points on the graph of y = mx + b.

Section 14 Simple Linear Regression: Introduction to Least Squares Regression

A Survey on Document Clustering For Identifying Criminal

A Study to Predict No Show Probability for a Scheduled Appointment at Free Health Clinic

Tracking System for GPS Devices and Mining of Spatial Data

Application of Data Mining Techniques in Intrusion Detection

International Journal of Computer Trends and Technology (IJCTT) volume 4 Issue 8 August 2013

2.1. Data Mining for Biomedical and DNA data analysis

Application of Predictive Model for Elementary Students with Special Needs in New Era University

KNIME TUTORIAL. Anna Monreale KDD-Lab, University of Pisa

Lean Six Sigma Analyze Phase Introduction. TECH QUALITY and PRODUCTIVITY in INDUSTRY and TECHNOLOGY

K-means Clustering Technique on Search Engine Dataset using Data Mining Tool

A Robustness Simulation Method of Project Schedule based on the Monte Carlo Method

Enhanced Boosted Trees Technique for Customer Churn Prediction Model

Example: Boats and Manatees

Chapter 7: Simple linear regression Learning Objectives

CURVE FITTING LEAST SQUARES APPROXIMATION

Florida Math for College Readiness

DATA MINING TECHNIQUES AND APPLICATIONS

Beating the MLB Moneyline

2. IMPLEMENTATION. International Journal of Computer Applications ( ) Volume 70 No.18, May 2013

A Review of Contemporary Data Quality Issues in Data Warehouse ETL Environment

Analytics on Big Data

Random Forest Based Imbalanced Data Cleaning and Classification

A Comparative Study of clustering algorithms Using weka tools

ORGANIZATIONAL KNOWLEDGE MAPPING BASED ON LIBRARY INFORMATION SYSTEM

Standardization and Its Effects on K-Means Clustering Algorithm

Factor Analysis. Chapter 420. Introduction

Chapter 7 Factor Analysis SPSS

Prediction of Stock Performance Using Analytical Techniques

Edifice an Educational Framework using Educational Data Mining and Visual Analytics

The Data Mining Process

Using Data Mining for Mobile Communication Clustering and Characterization

Hexaware E-book on Predictive Analytics

Data Quality Mining: Employing Classifiers for Assuring consistent Datasets

Data Mining - Healthcare Data

Transcription:

Crime Pattern Analysis, Visualization And Prediction Using Data Mining 1 Tushar Sonawanev, 2 Shirin Shaikh, 3 Shaista Shaikh, 4 Rahul Shinde, 5 Asif Sayyad 1 [BE], Computer science, S.R.E.S COE, Maharastra, India 2 [BE], Computer science, S.R.E.S COE, Maharastra, India 3 [BE], Computer science, S.R.E.S COE, Maharastra, India 4 [BE], Computer science, S.R.E.S COE, Maharastra, India 5 [BE], Computer science, S.R.E.S COE, Maharastra, India ABSTRACT Crime against women these days has become problem of every nation around the globe many countries are trying to curb this problem. Preventive are taken to reduce the increasing number of cases of crime against women. A huge amount of data set is generated every year on the basis of reporting of crime. This data can prove very useful in analysing and predicting crime and help us prevent the crime to some extent. Crime analysis is an area of vital importance in police department. Study of crime data can help us analyse crime pattern, inter-related clues& important hidden relations between the crimes. That is why data mining can be great aid to analyse, visualize and predict crime using crime data set. Classification and correlation of data set makes it easy to understand similarities & dissimilarities amongst the data objects. We group data objects using clustering technique. Dataset is classified on the basis of some predefined condition. Here grouping is done according to various types of crimes against women taking place in different states and cities of India. Crime mapping will help the administration to plan strategies for prevention of crime, further using data mining technique data can be predicted and visualized in various form in order to provide better understanding of crime patterns. Keyword: - K-Means, cluster, correlation. 1. INTRODUCTION India is a vast country with diversified societies. Position of women has been of great importance since ancient times in Indian culture. Unfortunately current scenario depicts a different story. According to National Crime Records Bureau, crime against women has significantly increased in recent years. It has become the most prior to the administration to enforce law & order to reduce this increasing rate of the crime against women. This is where criminology comes into picture. Criminology is scientific study of crime and criminal behaviour in order to detect crime characteristics. Use of data mining techniques can produce important results from crime dataset. The very step in study of crime is crime analysis.crime analysis is exploring, inter relating and detecting relationships between various crimes and characteristics of crimes.police department maintains crime data at the record. This data contains huge amount of data set with complex relationships which needs use of data mining techniques in order to be transformed into useful information. The knowledge extracted from the dataset can be a great tool & support to the police department to prevent crimes. An ideal crime analysis tool should be able to identify crime patterns quickly and in an efficient manner for future crime pattern detection and action. However, in the present scenario, the following major challenges are encountered. Increase in the size of crime information that has to be stored and analysed. 1325 www.ijariie.com 681

Problem of identifying techniques that can accurately and efficiently analyse these growing volumes of crime data Different methods and structures used for recording crime data. The data available is inconsistent and are incomplete thus making the task of formal analysis a far more difficult. Investigation of the crime takes longer duration due to complexity of issues All the above challenges motivated this research work to focus on providing solutions that can enhance the process of crime analysis for identifying and reducing crime in India. The main aim of this research work consist of developing analytical data mining methods that can systematically address the complex problem related to various form of crime. Thus, the main focus is to develop a crime analysis tool that assists the police in Detecting crime patterns and perform crime analysis Provide information to formulate strategies for crime prevention and reduction Identify and analyse common crime patterns to reduce further occurrences of similar incidence The present research work proposes the use of an amalgamation of data mining techniques that are linked with a common aim of developing such a crime analysis tool. For this purpose, the following specific objectives were formulated. To develop a data cleaning algorithm that cleans the crime dataset, by removing unwanted data to explore and enhance clustering algorithms to identify crime patterns from historical data To explore and enhance classification algorithms to predict future crime behaviour based on previous crime trends To develop anomalies detection algorithms to identify change in crime patterns. 2. REVIEW OF LITERATURE D.E. Brown constructed a software framework called ReCAP(Regional Crime Analysis Program) for mining data in order to catch professional criminals using data mining and data fusion techniques. In 2009, Li Ding et al.[11] propose an integrated system called PerSearch that takes a given description of a crime, including its location, type, and the physical description of suspects(personal characteristics or vehicles) as input. To detect suspects, the system will process these inputs through four integrated components: geographic profiling,social network analysis, crime profile, and physical matching. Essentially, geographic profiling determines where" the suspects are, while other components determine the suspects. De Bruin et. al. (2006) introduced a framework for crime trends using a new distance measure for comparing all individuals based on their profiles and then clustering them accordingly.this method also provided a visual clustering of criminal career sand identification of classes of criminals.from the literature study, it could be concluded that crime details increasing to very large quantities running into zota bytes(1024bytes). This in turn is increasing the need for advanced and efficient techniques for analysis. Data mining as an analysis and knowledge discovery tool has immense potential for crime data analysis. As is the case with any other new technology, the requirement of such tool changes, which is further augmented by the new and advanced technologies used by criminals. All these facts confirm that the field is not yet mature and needs further investigations. 3. METHODOLOGY Following figure shows the methodology used for crime pattern analysis. 1325 www.ijariie.com 682

Fig.1 Methodology for crime pattern analysis 3.1. Data Collection Enormous amount of crime data is collected at the end of year at police records. This data is made available by National Crime Bureau of Records. This data is in the form of number of cases recorded all over the nation throughout the year. The data is in raw form and also contains some wrong as well as missing values. Hence preprocessing of data becomes very necessary in order to bring the data in proper and clean form. Pre-processing of data includes data cleansing and PreProcessing. 3.2. Data Classification. We classify the data set into various groups based on certain characteristics of the data object here we group crimes according to states & cities. Classification of the crime is done on the basis of different types of crime.kmeans algorithm can be used to group data with similar characteristics. 3.2.1 K-Means Algorithm. 1325 www.ijariie.com 683

Fig2. K-Mean Algorithm K-means algorithm mainly used to partition the clusters based on their means. Initially number of objects are grouped and specified as k clusters. The mean value is calculated as the mean distance between the objects. The relocation iterative technique which is used to improve the partitions by moving objects from one group to other. Then number of iterations is done until the convergence occurs. K-means algorithm steps are given as Input: Number of clusters. Step1: Arbitrarily choose k objects from a dataset D of N objects as the initial cluster centers. Step 2: reassign each object which distributed to a cluster based on a cluster center which it is the most similar or the nearer. Step 3: Update the cluster means, i.e. calculate the mean value of the object for each cluster. Output: A set of k clusters. K-means algorithm is a base for all other clustering algorithms to find the mean values. 3.3 Correlating Crime Many crimes are related to other crime or criminal. Finding this correlation can be of great help in finding missing clues. Correlations can be used to help make predictions. If two variables have been known in the past to correlate, then we can assume they will continue to correlate in the future. We can use the value of one variable that is known now to predict the value that the other variable will take on in the future.pearson's correlation coefficient between two variables is defined as the covariance of the two variables divided by the product of their standard deviations.the form of the definition involves a "product moment, that is, the mean (the first moment about the origin) of the product of the mean-adjusted random variables; hence the modifier product-moment in the name. ρx, Y = COθ(X, Y)/σXσY where: co is covariance. 1325 www.ijariie.com 684

and σx is the standard deviation of X. Pearson product-moment correlation coefficient is a measure of the linear correlation or dependence between two variables X and Y, giving a value between +1and -1, where 1 is called total positive correlation, 0 is no correlation, and -1 is called total negative correlation.figure 5 explains how correlation between X and Y depending on value of r. Fig3.Correlation example of person 3.4 Predicting Crime Prediction of crime is a great aid to the administration in order to curb the crime incidences. Prediction is stating probability of an event in future period time. In this case crime against women can be predicted using linear regression. Prediction about various types of crimes and most probable places of occurrences of crime will be predicted linear regression is an approach for modelling the relationship between a scalar dependent variable y and one or more explanatory variables (or independent variable)denoted X. The case of one explanatory variable is called simple linear regression.in simple linear regression, we predict value of one variable from the value of a second variable. The variable we are predicting is called the criterion variable and is referred to as Y. The variable we are basing our predictions on is called the predictor variable and is referred to as X. When there is only one predictor variable, the prediction method is called simple regression. In simple linear regression,the predictions of Y when plotted as a function of X form a straight line. Line a rregression consists of finding the best-fitting straight line through the points. The bestfitting line is called a regression line.the most commonlyused criterion for the best-fitting line is the line that minimizes the sum of the squared errors of prediction. The formula for a regression line is Y = ax + b where,y is the predicted score, b is the slope of the line, and A is the Y intercept. The slope (b) can be calculated as follows: And the intercept (A) can be calculated as A = MY bmx. Where, MX is the mean of, My is the mean of, Sx is the standard deviation of X,Sy is the standard deviation of, and is the correlation between and. 1325 www.ijariie.com 685

Fig4.Linear regression 4. Result and future scope Result of this research will be to analyse, correlate and predict the crimes from huge data set available. Results will be in the form of correlation between various crime and location of crime i.e. state/city. Crime can also be correlated on the basis of age group, location of crime & type of crime. Prediction of the crime will be displayed using various diagrams pie charts, heat maps, spikes and graphs. REFERENCES [1] K.S.Arthisree, M.E, A.Jaganraj, Arulmigu Meenakshi, Identify Crime Detection Using Data Mining Techniques,International Journal of Advanced Research in Computer Science and Software Engineering, Amman College Of Engineering, Thiruvannamalai district, Near Kanchipuram, India,volume 3,Issue 8,August 2013. [2] B.Chandra and M. P.Gupta, Crime Data Mining for Indian Police Information System Manish Gupta,Indian Institute of Technology Delhi, Hauz Khas, New Delhi - 110 016,India. [3] D.E. Brown, The regional crime analysis program (RECAP): A frame work for mining data to catch criminals, In Proceedings of the IEEE International Conference on Systems, Man, and Cybernetics, Vol. 3, pp. 2848-2853, 1998. [4] Ram Singla, Deepak Dembla, Yogesh Chaba and Kevika Singla CITM, An Optimal KD Model for Crime Pattern Detection Based on Semantic Link Analysis,A Data Mining Tool C,Faridabad. [5] Malathi.,Dr. S. Santhosh Baboo Reader, An Enhanced Algorithm to Predict a Future Crime using Data Mining,International Journal of Computer Applications (09758887),Volume 21 No.1, May 2011. [6] TongWang1, Cynthia Rudin1, Daniel Wagner, and Rich Sevieri, Learning to detect patterns of crime, JECET, 1:124131, 2012. [7] Revatthy Krishnamurthy,J. Satheesh Kumar, SURVEY OF DATA MINING TECHNIQUES ON CRIME DATA ANALYSIS, International Journal of Data Mining Techniques and Applications Vol 01, Issue 02, December 2012. [8] Shyam Varan Nath Florida, Crime Pattern Detection Using Data Mining, Proceedings of the 2006 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology (WI-IAT 2006 Workshops) (WI-IATW06). [9] Priyanka Gera, Rajan Vohra, City Crime Profiling Using Cluster Analysis,International Journal of Computer Science and Information Technologies, Vol. 5 (4), 2014. [10] Malathi. A and Dr. S. Santhosh Baboo, Enhanced Algorithms to Identify Change in Crime Patterns, International Journal of Combinatorial Optimization Problems and Informatics, Vol. 2, No.3, Sep-Dec 2011, pp. 32-38, ISSN: 200 1325 www.ijariie.com 686