Study and Analysis of Data Mining Concepts



Similar documents
SPATIAL DATA CLASSIFICATION AND DATA MINING

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014

Data Mining System, Functionalities and Applications: A Radical Review

Database Marketing, Business Intelligence and Knowledge Discovery

Introduction. A. Bellaachia Page: 1

Dynamic Data in terms of Data Mining Streams

OLAP and Data Mining. Data Warehousing and End-User Access Tools. Introducing OLAP. Introducing OLAP

DATA MINING TECHNIQUES SUPPORT TO KNOWLEGDE OF BUSINESS INTELLIGENT SYSTEM

Data Warehousing and Data Mining in Business Applications

International Journal of Scientific & Engineering Research, Volume 5, Issue 4, April ISSN

A STUDY ON DATA MINING INVESTIGATING ITS METHODS, APPROACHES AND APPLICATIONS

Data Mining Solutions for the Business Environment

Introduction to Data Mining

Course DSS. Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization

Chapter 5. Warehousing, Data Acquisition, Data. Visualization

Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization

Healthcare Measurement Analysis Using Data mining Techniques

Introduction to Data Mining

An Overview of Knowledge Discovery Database and Data mining Techniques

Introduction to Data Mining and Machine Learning Techniques. Iza Moise, Evangelos Pournaras, Dirk Helbing

Introduction to Data Mining and Business Intelligence Lecture 1/DMBI/IKI83403T/MTI/UI

5.5 Copyright 2011 Pearson Education, Inc. publishing as Prentice Hall. Figure 5-2

INTERNATIONAL JOURNAL FOR ENGINEERING APPLICATIONS AND TECHNOLOGY DATA MINING IN HEALTHCARE SECTOR.

DATA MINING CONCEPTS AND TECHNIQUES. Marek Maurizio E-commerce, winter 2011

2.1. Data Mining for Biomedical and DNA data analysis

Data mining in the e-learning domain

Data Mining: Concepts and Techniques. Jiawei Han. Micheline Kamber. Simon Fräser University К MORGAN KAUFMANN PUBLISHERS. AN IMPRINT OF Elsevier

Data Mining - Introduction

Inner Classification of Clusters for Online News

Adding New Level in KDD to Make the Web Usage Mining More Efficient. Abstract. 1. Introduction [1]. 1/10

DATA MINING TECHNOLOGY. Keywords: data mining, data warehouse, knowledge discovery, OLAP, OLAM.

Foundations of Business Intelligence: Databases and Information Management

A Review of Data Mining Techniques

Prediction of Heart Disease Using Naïve Bayes Algorithm

Introduction to Data Mining Techniques

Information Management course

COURSE RECOMMENDER SYSTEM IN E-LEARNING

Class 10. Data Mining and Artificial Intelligence. Data Mining. We are in the 21 st century So where are the robots?

Data Mining Governance for Service Oriented Architecture

Data Mining. Vera Goebel. Department of Informatics, University of Oslo

Lluis Belanche + Alfredo Vellido. Intelligent Data Analysis and Data Mining

DATA MINING TECHNIQUES AND APPLICATIONS

Course MIS. Foundations of Business Intelligence

CHAPTER 4 Data Warehouse Architecture

Fluency With Information Technology CSE100/IMT100

Web Data Mining: A Case Study. Abstract. Introduction

Topics in basic DBMS course

Foundations of Business Intelligence: Databases and Information Management

Syllabus. HMI 7437: Data Warehousing and Data/Text Mining for Healthcare

How To Use Neural Networks In Data Mining

Chapter 6 FOUNDATIONS OF BUSINESS INTELLIGENCE: DATABASES AND INFORMATION MANAGEMENT Learning Objectives

ISSN: A Review: Image Retrieval Using Web Multimedia Mining

Introduction to Data Mining

A STUDY OF DATA MINING ACTIVITIES FOR MARKET RESEARCH

Data Warehousing and Data Mining

LITERATURE SURVEY ON DATA WAREHOUSE AND ITS TECHNIQUES

Software Development Training Camp 1 (0-3) Prerequisite : Program development skill enhancement camp, at least 48 person-hours.

TIM 50 - Business Information Systems

2074 : Designing and Implementing OLAP Solutions Using Microsoft SQL Server 2000

Data Warehousing and OLAP Technology for Knowledge Discovery

Foundations of Business Intelligence: Databases and Information Management

Customer Classification And Prediction Based On Data Mining Technique

Data Mining as Part of Knowledge Discovery in Databases (KDD)

Foundations of Business Intelligence: Databases and Information Management

ASSOCIATION RULE MINING ON WEB LOGS FOR EXTRACTING INTERESTING PATTERNS THROUGH WEKA TOOL

Data Mining Analytics for Business Intelligence and Decision Support

Enhanced Boosted Trees Technique for Customer Churn Prediction Model

A Knowledge Management Framework Using Business Intelligence Solutions

Associate Professor, Department of CSE, Shri Vishnu Engineering College for Women, Andhra Pradesh, India 2

Introduction to Data Mining

Index Contents Page No. Introduction . Data Mining & Knowledge Discovery

Overview Applications of Data Mining In Health Care: The Case Study of Arusha Region

DMDSS: Data Mining Based Decision Support System to Integrate Data Mining and Decision Support

Subject Description Form

DATA WAREHOUSE CONCEPTS DATA WAREHOUSE DEFINITIONS

A Framework for Dynamic Faculty Support System to Analyze Student Course Data

Data Warehouse: Introduction

Concept and Applications of Data Mining. Week 1

Data Mining and Business Intelligence CIT-6-DMB. Faculty of Business 2011/2012. Level 6

ETPL Extract, Transform, Predict and Load

Computer Information Systems

How To Use Data Mining For Knowledge Management In Technology Enhanced Learning

ANALYSIS OF WEBSITE USAGE WITH USER DETAILS USING DATA MINING PATTERN RECOGNITION

(b) How data mining is different from knowledge discovery in databases (KDD)? Explain.

Praseeda Manoj Department of Computer Science Muscat College, Sultanate of Oman

IJMIE Volume 2, Issue 5 ISSN:

Use of Data Mining in the field of Library and Information Science : An Overview

Association Technique on Prediction of Chronic Diseases Using Apriori Algorithm

A Technical Review on On-Line Analytical Processing (OLAP)

Significance of Data Warehousing and Data Mining in Business Applications

A COGNITIVE APPROACH IN PATTERN ANALYSIS TOOLS AND TECHNIQUES USING WEB USAGE MINING

CHAPTER 1 INTRODUCTION

DATA WAREHOUSING AND OLAP TECHNOLOGY

Data Warehousing and Data Mining

Data Mining for Successful Healthcare Organizations

Enterprise Resource Planning Analysis of Business Intelligence & Emergence of Mining Objects

Transcription:

Study and Analysis of Data Mining Concepts M.Parvathi Head/Department of Computer Applications Senthamarai college of Arts and Science,Madurai,TamilNadu,India/ Dr. S.Thabasu Kannan Principal Pannai College of engineering and Inforamtion Technology,Madurai,TamilNadu,India Abstract- Data mining is a process which finds useful patterns from large amount of data. It predicts future trends and behaviors allowing businesses to take decisions. The paper discusses few of the data mining techniques, algorithms and some of the organizations which have adapted data mining technology to improve their businesses and found excellent results. And also discuss about the architecture of data mining systems, and the tasks and the major issues of data mining. Keywords Data mining Techniques, Data mining algorithms, Tasksa and Issues. I. INTRODUCTION The major reason that data mining has attracted a great deal of attention in information industry in recent years is due to the wide availability of huge amounts of data and the imminent need for turning such data into useful information and knowledge. The information and knowledge gained can be used for applications ranging from business management, production control, and market analysis, to engineering design and science exploration. Data mining can be viewed as a result of the natural evolution of information technology. An evolutionary path has been witnessed in the database industry in the development of the following functionalities data collection and database creation, data management (including data storage and retrieval, and database transaction processing), and data analysis and understanding (involving data warehousing and data mining). For instance, the early development of data collection and database creation mechanisms served as a prerequisite for later development of effective mechanisms for data storage and retrieval, and query and transaction processing. With numerous database systems query and transaction processing as common practice, data analysis and understanding has naturally become the next target. II. EVOLUTION OF DATABASE Database technology since the mid - 1980 s has been characterized by the popular adoption of relational technology and an upsurge of research and development activities on new and powerful database systems. These systems employ advanced data models such as extended relational, object-oriented, object-relational, and deductive models. Advanced-oriented database systems, including spatial, temporal, multimedia, active, and scientific databases, knowledge bases, and office information bases, have flourished. Issues related to the distribution, diversification, and sharing of data have been studied extensively. Heterogeneous database systems and Internet-based global information systems such as the World-Wide Web (WWW) also emerged and play a vital role in the information industry. Vol. 5 Issue 1 January 2015 280 ISSN: 2278-621X

Figure-1 Evaluation of Database Data Collection and Database Creation 1960 s and earlier Database Management Systems 1970 s early 1980 hierarchical and network database system relational database system data modeling tools query language on line transaction 0 processing (OLTP) Advanced database system mid 1980 s present Advanced Models Advanced Applications Advanced data Analysis late 1980 s present Data warehouse and OLAP Data mining and knowledge discovery Data mining Applications Web based databases 1990 s present XML based database system Integration with information retrieval Data and information integration New generation of Data integration and Information Systems Present and future III. EVOLUTION AND FOUNDATIONS OF DATA MINING It is a application for business and is supported by three technologies Massive data collection Multiprocessor computers Data mining algorithm Steps in Evolution of Data mining Data collection (1960) computers, tapes and disks Data access (1980) RDBMS, SQL, ODBC Data warehousing and decision support (1990) online analytic processing (OLAP) multidimensional databases, data warehouse. Data mining advanced algorithm, multiprocessor computer, massive databases. Vol. 5 Issue 1 January 2015 281 ISSN: 2278-621X

IV. DATA MINING DEFINITION Data mining is a process of extracting or mining of useful information and patterns from huge data. It is also called as knowledge discovery process, knowledge mining from data, knowledge extraction or dta / pattern analysis. The mined information may be of any other relation between the data items in the data. Mining process is valid, actionable and previously unknown. Figure 2 Data Mining Process Problem definition Data gathering and Preparation Data access Data sampling Data transformation Model Building and Evaluation Create Model Test Model Evaluate and interpret Model Knowledge Deployment Modern apply Custom repots External Applications Data mining is a logical process that is used to search through large amount of data in order to find useful data. The goal of this technique is to find patterns that were previously unknown. Once these patterns are found they can further be used to make certain decisions for development of their businesses. Three steps involved are Exploration Pattern identification Deployment Exploration: In the first step of data exploration data is cleaned and transformed into another form, and important variables and then nature of data based on the problem are determined. Pattern Identification: Once data is explored, refined and defined for the specific variables and second step is to form pattern identification. Identify and choose the patterns which make the best prediction. Deployment: Patterns are deployed for desired outcome. Vol. 5 Issue 1 January 2015 282 ISSN: 2278-621X

V. ARCHITECTURE OF DATA MINING Figure 3 Architecture of Data Mining User Interface Pattern Evaluation Data Mining Engine Knowledge base Data Warehouse and Database Server Data Cleaning, integration and selection Database Data Warehouse World Wide Web Other Repositories a. Database, Data Warehouse or other Information Repository This is one or et of database and data warehouse and etc. Data cleaning and integration techniques may be applied on data. b. Data base and Data warehouse server It is responsible for fetching data based on user s data mining request. c. Knowledge base It is used to search or evaluate the interestingness of resulting patterns. It uses the hierarchy concept to organize the attribute. d. Data Mining Engine It is essential. It consists of some functional modules for task like characterization, classification, clustering etc. e. Pattern Evaluation It measures and interacts with the modules to focus the search towards interesting pattern. It is necessary to confine the search to only the interesting pattern. Vol. 5 Issue 1 January 2015 283 ISSN: 2278-621X

f. GUI It communicates between user and data mining systems. User may interact the system by specifying the data mining queries. It allows the user to browse the database, data structure, evaluate patterns, and visualize the patterns in different forms. VI. DATA MINING TASKS Data mining provides the link between the transaction and analytical systems. Data mining software analysis relationships and patterns in stored transaction data based on open ended user queries. Relationships are classified into two methods. Prediction method is uses some variables to predict unknown values of other variables. Description method is used to identify the pattern or relationship in data. Figure 4 Tasks of Data Mining Classification Predictive Regression Time Series Analysis Prediction DATA MINING Clustering Summarization Descriptive Association Rules a. Classification Sequence Discovery It maps the data into predefined groups or classes. The classes are determined before examining the data. And also to stored data locate the data in the predefined group. b. Clustering In this method groups are not predefined. It is defined by the data. It determines the similarity among the data on predefined attributes. Data are grouped into clusters. And the grouping is based on logical relationships. c. Association rules It indentifies data associated with each others. It is often used in the retail sales community which is frequently purchase together. d. Sequence pattern discovery It is used to determine sequential pattern in data. It is based on a time sequence of actions. It is similar to association in that data but relationships based on time. Vol. 5 Issue 1 January 2015 284 ISSN: 2278-621X

e. Regression It assumes that the target data fit into some known type of function. It determined the best function of this type of data. f. Time series analysis The value of an attributes is examined as it varies over time. The values obtained at limited time period. g. Prediction Predict the future data states based on past and current data. It predicts future state than the current state. It includes flooding, speech recognize and pattern recognisation. h. Summarization It maps the data into subsets with associated simple descriptions. It is known as characterization and generalization. It derives representative from the database. It characterize the content of the database. VII. DATA MINING ISSUES Figure 5 Issues of Data Mining STATISTICS DATABASE TECHNOLOGY DATA MINING VISUALIZATION \ INFORMATION SCIENCE OTHER DISCIPLINES a. Human interaction Interfaces may be needed with both domain and technical experts. Experts formulates the queries to interpret the results. Users identify the data and desired results. b. Over fitting When the model is generated with the given databases. It must fit in further for further database. It may arise when the model is created for small size of database. It may arise even though the data are not changed. Vol. 5 Issue 1 January 2015 285 ISSN: 2278-621X

c. Outliers Data entities not fit into derived model. Some of the model may not behave well for the data that are not with outliers. d. Interpretation of results Experts needed to interpret the results. It may be meaningless for average database users. e. Visualization of results It is helpful to view the output of data mining algorithms. f. Large data sets The algorithm designed for smaller data sets may create problem for large data sets, associated with data using. g. High dimensionality Not all the attributes are needed to solve the problem. Some method may increase the complexity. Some method may decrease the efficiency of an algorithm. h. Multimedia data Previous data mining algorithm designed for traditional data types. Some new algorithm use of multimedia data. i. Missing data Missing data may be replaced with estimates. Missing data can lead to invalid results. j. Irrelevant data This may not be used to develop the data mining task. k. Noisy data Data which is invalid or incorrect. It must be corrected whenever the data mining application is running. l. Changing data Data bases are not static. Data mining assume the database as static. Algorithm are to be run again whenever the changes occur in the database. m. Integration Integration of data mining functions into traditional DBMS systems is used for a desirable results. n. Application Determine the use for the information obtained for data mining function. VIII. HOW DATA MINING WORKS Data mining is a process called knowledge discovery form database. It invokes scientist, machine learning, Artificial intelligence, information retrieval and pattern recognition. Vol. 5 Issue 1 January 2015 286 ISSN: 2278-621X

Figure 6 Working Principles of Data Mining LEARNING COLLECTING RELEVANT DATA MODEL BUILDING UNDERSTANDING OF BUSINESS PROBLEM IDENTIFICATION BUSINESS STRATEGY AND EVALUATION ACTION a. Modeling Build a model on the data from the existing situation where thon where the answer is known and then applying the model to other situation where the answer is not known People have been doing it for a long time. No problem of data storage and communication. Lots of information about a variety of situations where an answer is known is loaded. Data mining software filters the characteristics of the data that go into the model. Model is built and now can be used in similar situation where the answer is not known. b. Discovery Find something that is new. Data mining tools that sweeps through databases and identify previously hidden pattern. Pattern discovery is the analysis of retail sales data to indentify unrelated products that are often purchased together. c. Prediction Predict the reason. Find a pattern is association with a very specific event or attribute. d. Over fitting Data mining term was used in statistical community. IX. CONCLUSION Data mining involves extracting useful rules or interesting patterns from historical data. There are many data mining tasks each of them further has many techniques. No free lunch theorem exists that is a single technique is not suitable for all kinds of data for all types of domains. Sometimes hybrid techniques have been observed to perform better as compared to the pure ones. Data mining is a decision support process in which we search for patterns of information in data. Data mining techniques such as classification, clustering, prediction, association and sequential patterns etc. The commercial, educational and scientific applications are increasingly dependent on these methodologies. Decision trees are a reliable and effective decision making technique which provide high Vol. 5 Issue 1 January 2015 287 ISSN: 2278-621X

classification accuracy with a simple representation of collected KDD. It help experts to validate and classify the results and outcomes of tests and analyze various new symptoms of diseases based on data. Thus, data mining can help to play an important role in the field of medicine or health care and disease prediction. REFERENCES [1] Han.J.Kamber. M. data mining concepts and techniques, Morgan Kaufmann publisher, 2001. [2] R.S. Michalski, I. Bratko, and M. Kubat. Machine learning and data mining: Methods and applications. John wiley & sons, 1998. [3] Hand. D., Mannila. H.m Smythe. P., Principles of data mining, Prentice Hall of India, 2001. [4] S.Vijiyarani S.Sudha, Disease Prediction in Data Mining Technique A Survey, International Journal of Computer Applications & Information Technology, ISSN: 2278-7720 Vol. II, Issue I, January 2013. [5] Vili Podgorelec, Peter Kokol, Bruno Stiglic, Ivan Rozman, Decision trees: an overview and their use in medicine, Journal of Medical Systems, Kluwer Academic/Plenum Press,Vol. 26, Num. 5, pp. 445-463, October 2002. [6] Goebel, M., and Gruenwald, L. A Survey of Knowledge Discovery and Data Mining Tools. Technical Report, University of Oklahoma, School of Computer Science, Norman, OK, February 1998. [7] Meta Group Inc. Data Mining: Trends, Technology, and Implementation Imperatives. Stamford, CT, February 1997. [8] Goebel, M. and Grunewald, L., A Survey of Knowledge Discovery and Data Mining Tools. Technical Report, University of Oklahoma, School of Computer Science, Norman, OK, February 1998. [9] Berson, A., Smith, S., & Thearling, K. (2011). An Overview of Data Mining Techniques Retrieved November 28, 2011 [10] Dunham, M. (2003). Data Mining: Introductory and Advanced Topics Pearson Education. Vol. 5 Issue 1 January 2015 288 ISSN: 2278-621X