A Review of Data Mining Techniques

Similar documents
An Overview of Knowledge Discovery Database and Data mining Techniques

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014

A STUDY ON DATA MINING INVESTIGATING ITS METHODS, APPROACHES AND APPLICATIONS

A Survey on Web Research for Data Mining

Course DSS. Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization

DATA MINING TECHNIQUES SUPPORT TO KNOWLEGDE OF BUSINESS INTELLIGENT SYSTEM

Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization

Database Marketing, Business Intelligence and Knowledge Discovery

Introduction to Data Mining and Machine Learning Techniques. Iza Moise, Evangelos Pournaras, Dirk Helbing

Chapter 5. Warehousing, Data Acquisition, Data. Visualization

Data Warehousing and Data Mining in Business Applications

A HOLISTIC FRAMEWORK FOR KNOWLEDGE MANAGEMENT

SPATIAL DATA CLASSIFICATION AND DATA MINING

DATA MINING TECHNIQUES AND APPLICATIONS

Data Mining System, Functionalities and Applications: A Radical Review

Introduction to Data Mining

Fluency With Information Technology CSE100/IMT100

Importance or the Role of Data Warehousing and Data Mining in Business Applications

not possible or was possible at a high cost for collecting the data.

Data Mining is sometimes referred to as KDD and DM and KDD tend to be used as synonyms

Introduction. A. Bellaachia Page: 1

DATA MINING TECHNOLOGY. Keywords: data mining, data warehouse, knowledge discovery, OLAP, OLAM.

Information Management course

Data Mining for Successful Healthcare Organizations

Foundations of Business Intelligence: Databases and Information Management

Introduction to Data Mining

Enhanced Boosted Trees Technique for Customer Churn Prediction Model

DATA MINING AND WAREHOUSING CONCEPTS

Introduction to Data Mining and Business Intelligence Lecture 1/DMBI/IKI83403T/MTI/UI

DATA WAREHOUSE AND DATA MINING NECCESSITY OR USELESS INVESTMENT

Data Mining. Vera Goebel. Department of Informatics, University of Oslo

Healthcare Measurement Analysis Using Data mining Techniques

What is Customer Relationship Management? Customer Relationship Management Analytics. Customer Life Cycle. Objectives of CRM. Three Types of CRM

Data Mining for Fun and Profit

Data Mining and Knowledge Discovery in Databases (KDD) State of the Art. Prof. Dr. T. Nouri Computer Science Department FHNW Switzerland

Social Media Mining. Data Mining Essentials

Keywords Data Mining, Knowledge Discovery, Direct Marketing, Classification Techniques, Customer Relationship Management

Lluis Belanche + Alfredo Vellido. Intelligent Data Analysis and Data Mining

Data Mining as Part of Knowledge Discovery in Databases (KDD)

Federico Rajola. Customer Relationship. Management in the. Financial Industry. Organizational Processes and. Technology Innovation.

Customer Classification And Prediction Based On Data Mining Technique

EMPIRICAL STUDY ON SELECTION OF TEAM MEMBERS FOR SOFTWARE PROJECTS DATA MINING APPROACH

Big Data with Rough Set Using Map- Reduce

Assessing Data Mining: The State of the Practice

Chapter ML:XI. XI. Cluster Analysis

Building Data Warehousing and Data Mining from Course Management Systems: A Case Study of FUTA Course Management Information Systems

Inner Classification of Clusters for Online News

Index Contents Page No. Introduction . Data Mining & Knowledge Discovery

Welcome. Data Mining: Updates in Technologies. Xindong Wu. Colorado School of Mines Golden, Colorado 80401, USA

Introduction to Data Mining Techniques

ISSN: (Online) Volume 3, Issue 7, July 2015 International Journal of Advance Research in Computer Science and Management Studies

International Journal of Computer Science Trends and Technology (IJCST) Volume 3 Issue 3, May-June 2015

EFFICIENT DATA PRE-PROCESSING FOR DATA MINING

International Journal of Scientific & Engineering Research, Volume 5, Issue 4, April ISSN

A STUDY OF DATA MINING ACTIVITIES FOR MARKET RESEARCH

The University of Jordan

OLAP and Data Mining. Data Warehousing and End-User Access Tools. Introducing OLAP. Introducing OLAP

The Scientific Data Mining Process

Dynamic Data in terms of Data Mining Streams

Chapter 2 Literature Review

How To Use Neural Networks In Data Mining

How To Use Data Mining For Knowledge Management In Technology Enhanced Learning

IMPROVING DATA INTEGRATION FOR DATA WAREHOUSE: A DATA MINING APPROACH

IT and CRM A basic CRM model Data source & gathering system Database system Data warehouse Information delivery system Information users

An Overview of Database management System, Data warehousing and Data Mining

Alexander Nikov. 5. Database Systems and Managing Data Resources. Learning Objectives. RR Donnelley Tries to Master Its Data

Decision Support and Business Intelligence Systems. Chapter 1: Decision Support Systems and Business Intelligence

The Role of Data Warehousing Concept for Improved Organizations Performance and Decision Making

Statistics for BIG data

Chapter 6. Foundations of Business Intelligence: Databases and Information Management

Hadoop Operations Management for Big Data Clusters in Telecommunication Industry

Shilpi Bansal Ph.D. Scholar Mewar University, Chittorgarh, Rajasthan (India), Asst. Professor MCA Programme, IPEM, Ghaziabad (India),

Review on Financial Forecasting using Neural Network and Data Mining Technique

Sanjeev Kumar. contribute

International Journal of Advancements in Research & Technology, Volume 2, Issue 10, October ISSN

A Regression Approach for Forecasting Vendor Revenue in Telecommunication Industries

Data Mining. 1 Introduction 2 Data Mining methods. Alfred Holl Data Mining 1

DATA ANALYSIS USING BUSINESS INTELLIGENCE TOOL. A Thesis. Presented to the. Faculty of. San Diego State University. In Partial Fulfillment

Web Usage Mining: Identification of Trends Followed by the user through Neural Network

Chapter 6 8/12/2015. Foundations of Business Intelligence: Databases and Information Management. Problem:

ASSOCIATION RULE MINING ON WEB LOGS FOR EXTRACTING INTERESTING PATTERNS THROUGH WEKA TOOL

Using Data Mining for Mobile Communication Clustering and Characterization

Data Mining and Exploration. Data Mining and Exploration: Introduction. Relationships between courses. Overview. Course Introduction

A New Approach for Evaluation of Data Mining Techniques

Data Mining: Concepts and Techniques. Jiawei Han. Micheline Kamber. Simon Fräser University К MORGAN KAUFMANN PUBLISHERS. AN IMPRINT OF Elsevier

Introduction to Data Mining Principles

META DATA QUALITY CONTROL ARCHITECTURE IN DATA WAREHOUSING

Mobile Phone APP Software Browsing Behavior using Clustering Analysis

A Survey on Pre-processing and Post-processing Techniques in Data Mining

Use of Data Mining in the field of Library and Information Science : An Overview

Machine Learning and Data Mining. Fundamentals, robotics, recognition

Title. Introduction to Data Mining. Dr Arulsivanathan Naidoo Statistics South Africa. OECD Conference Cape Town 8-10 December 2010.

Data, Measurements, Features

INVESTIGATIONS INTO EFFECTIVENESS OF GAUSSIAN AND NEAREST MEAN CLASSIFIERS FOR SPAM DETECTION

RESEARCH PAPERS FACULTY OF MATERIALS SCIENCE AND TECHNOLOGY IN TRNAVA SLOVAK UNIVERSITY OF TECHNOLOGY IN BRATISLAVA

Gerard Mc Nulty Systems Optimisation Ltd BA.,B.A.I.,C.Eng.,F.I.E.I

Foundations of Business Intelligence: Databases and Information Management

ETL PROCESS IN DATA WAREHOUSE

Data Mining Solutions for the Business Environment

An Introduction to Data Mining

Transcription:

Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 4, April 2014, pg.1401 1406 REVIEW ARTICLE ISSN 2320 088X A Review of Data Mining Techniques Sukhdev Singh Ghuman Dept. of Comp. Sci., SBDSM Khalsa College, Domeli (Kapurthala), Punjab, India ghumanggg@gmail.com Abstract- Information technology has revolutionized the whole world with cheaper and fast communication through different modes. All these devices generate lots of data which need to be processed to extract useful patterns of data or information. The database technologist are seeking means to store, manipulate and retrieve data while data mining area is striving hard to find new and efficient techniques for information extraction form the vast amount of data. Data Mining is also referred by the names like Knowledge Discovery in Database (KDD) or Predictive Analytics or Data Science. The various techniques used for extraction are genetic algorithms, decision trees, artificial neural networks, induction and visualization. Data mining is generally an iterative and interactive discovery process. The goal of this process is to mine patterns, associations, changes, anomalies, and statistically significant structures from large amount of data. Keywords: Data Mining, Patterns, Knowledge Discovery, Database, Techniques I. INTRODUCTION Data mining is the process of discovering patterns in the large data sets. The purpose of the data mining is to find information from the large data sets and convert it into usable structures so that this information can used for further processing without any difficulty. It is handled by databases and managed by database management aspects. This is a commonly used word for any kind of large scale data processing. The term data mining was discovered around 1990 in computer science. It is also referred by several other terms like Knowledge Discovery in Databases (KDD) or Predictive Analytics or Data Science [1]. Data mining is generally an iterative and interactive discovery process. The goal of this process is to mine patterns, associations, changes, anomalies, and statistically significant structures from large amount of data [3]. The mined results should be valid, novel, useful, and understandable. This paper presents a brief introduction about data mining in section 1. The second section illustrates the process of data mining while the third 2014, IJCSMC All Rights Reserved 1401

section reviews different data mining techniques. The fourth section is committed to Knowledge Discovery in Databases (KDD) and fifth section discusses some issues relating to data mining. The last section presents the conclusion. II. PROCESS OF DATA MINING The process of data mining is sequential which require many steps to be followed which are as shown below in the form of a diagram [3]. Figure 1: Data mining process [3] 1. Extract, transform, and load transaction data onto the data warehouse system. 2. Store and manage the data in a multidimensional database system. 3. Provide data access to business analysts and information technology professionals. 4. Analyze the data by application software. 5. Present the data in a useful format, such as a graph or table. III. TECHNIQUES OF DATA MINING Data mining is complex process and it requires not only fast processing devices but good and efficient techniques of data processing. The important techniques of data mining are as listed below:-. Artificial neural networks: AI techniques are widely used in Data Mining. Techniques such as pattern recognition, machine learning, and neural networks are very useful. Many 2014, IJCSMC All Rights Reserved 1402

other techniques in AI such as knowledge acquisition, knowledge representation, and search, are relevant to the various process steps in data mining [4]. It is a non-linear predictive model. It learns through training and resembles biological neural networks in structure. Genetic algorithms: Optimization techniques that use process such as genetic combination, mutation, and natural selection in a design based on the concepts of natural evolution. It is a relatively new software paradigm inspired by Darwin s theory of evolution. A population of rules, each representing a possible solution to a problem, is initially created at random. Then pairs of rules are combined to produce off spring for the next generation. A mutation process is used to randomly modify the genetic structures of some members of each new generation. The system runs for dozens or hundreds of generations. The process is terminated when an acceptable or optimum solution is found, or after some fixed time limit. Genetic algorithms are appropriate problems that require optimization with respect to some computable criterion. This paradigm can be applied to Data Mining problems. Large and complex problems require a fast computer in order to obtain appropriate solutions in a reasonable amount of time. Mining large data sets by genetic algorithms has become practical only recently due to the availability of affordable high-speed [4]. Decision trees: Tree-shaped structures that represent sets of decisions. These decisions generate rules for the classification of a dataset. Specific decision tree methods include Classification and Regression Trees (CART) and Chi Square Automatic Interaction Detection (CHAID). These are decision tree techniques used for classification of a dataset. They provide a set of rules that can be applied to a new dataset to predict which records will have a given outcome [4]. Nearest neighbor method: A technique that classifies each record in a dataset based on a combination of the classes of the k record(s) most similar to it in a historical dataset (where k 1). Sometimes called the k-nearest neighbor technique. Rule induction: This technique is used for the extraction of useful if-then rules from data based on statistical significance. Data visualization: It is concerned with visual interpretation of complex relationships in multidimensional data. Graphics tools are used to illustrate data relationships. A picture is worth thousands of numbers. Visual data mining techniques have proven the value in exploratory data analysis, and they also have a good potential for mining large database. This approach requires the integration of human in the data mining process. 2014, IJCSMC All Rights Reserved 1403

IV. KNOWLEDGE DISCOVERY IN DATABASES Data Mining, also popularly known as Knowledge Discovery in Databases (KDD), refers to the nontrivial extraction of implicit, previously unknown and potentially useful information from data in databases. While data mining and knowledge discovery in databases (or KDD) are frequently treated as synonyms, data mining is actually part of the knowledge discovery process.the Knowledge Discovery in Databases process comprises of a few steps leading from raw data collections to some form of new knowledge. The iterative process consists of the following steps [5]: Data cleaning: also known as data cleansing, it is a phase in which noise data and irrelevant data are removed from the collection. Data integration: at this stage, multiple data sources, often heterogeneous, may be combined in a common source. Data selection: at this step, the data relevant to the analysis is decided on and retrieved from the data collection. Data transformation: also known as data consolidation, it is a phase in which the selected data is transformed into forms appropriate for the mining procedure. Data mining: it is the crucial step in which clever techniques are applied to extract patterns potentially useful. Pattern evaluation: in this step, strictly interesting patterns representing knowledge are identified based on given measures. Knowledge representation: is the final phase in which the discovered knowledge is visually represented to the user. This essential step uses visualization techniques to help users understand and interpret the data mining results. V. ISSUES IN DATA MINING Data mining algorithms embody techniques that have sometimes existed for many years, but have only lately been applied as reliable and scalable tools that time and again outperform older classical statistical methods. While data mining is still in its infancy, it is becoming a trend and ubiquitous. Before data mining develops into a conventional, mature and trusted discipline, many still pending issues have to be addressed. Some of these issues are addressed below. Note that these issues are not exclusive and are not ordered in any way [2]. 2014, IJCSMC All Rights Reserved 1404

Security and Social Issue: Security is an important issue with any data collection that is intended to be shared. It is the issue of individual privacy. Data mining makes it possible to analyze routine business transactions and glean a significant amount of information about individuals buying habits and preferences. Data integrity: Data analysis can only be as good as the data that is being analyzed. A key implementation challenge is integrating conflicting or redundant data from different sources. For example, a bank may maintain credit cards accounts on several different databases. The addresses (or even the names) of a single cardholder may be different in each. Software must translate data from one system to another and select the address most recently entered. Mining Methodology: An important technical issue is whether it is better to set up a relational database structure or a multidimensional one. In a relational structure, data is stored in tables, permitting ad hoc queries. In a multidimensional structure, on the other hand, sets of cubes are arranged in arrays, with subsets created according to category. While multidimensional structures facilitate multidimensional data mining, relational structures thus far have performed better in client/server environments. And, with the explosion of the Internet, the world is becoming one big client/server environment. Cost: Finally, there is the issue of cost. While system hardware costs have dropped dramatically within the past five years, data mining and data warehousing tend to be selfreinforcing. The more powerful the data mining queries, the greater the utility of the information being gleaned from the data, and the greater the pressure to increase the amount of data being collected and maintained, which increases the pressure for faster, more powerful data mining queries. This increases pressure for larger, faster systems, which are more expensive [5]. Data source issues: There are many issues related to the data sources, some are practical such as the diversity of data types, while others are philosophical like the data glut problem. VI. CONCLUSION Data mining is concerned with extracting useful rules or interesting patterns from the bulk amount of data collected through various sources. There are many data mining techniques which can be used to perform the job efficiently. It is to be noted that a single technique cannot be used for all types of data because depending on the type of data, appropriate technique is available for extraction of information. Sometimes hybrid techniques are more useful instead of a single technique. 2014, IJCSMC All Rights Reserved 1405

[1] http:/www.wikipadeia.com [2] http://www.anderson.ucla.edu REFERENCES [3] Mohammed J. Zaki, DATA MINING TECHNIQUES, August 2003 [4] Sang Jun Lee, Keng Siau, A Review of Data Mining Techniques Industrial Management & Data Systems 101/1 (2001) 41-46 [5] Dr. Rajni Jain, Introduction to Data Mining Techniques 2014, IJCSMC All Rights Reserved 1406