not possible or was possible at a high cost for collecting the data.



Similar documents
Data Mining for Fun and Profit

Data Mining: Overview. What is Data Mining?

Data Mining Solutions for the Business Environment

Data Mining is sometimes referred to as KDD and DM and KDD tend to be used as synonyms

Chapter 12 Discovering New Knowledge Data Mining

Session 10 : E-business models, Big Data, Data Mining, Cloud Computing

DATA MINING TECHNIQUES AND APPLICATIONS

Data Mining: Motivations and Concepts

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014

Data Mining System, Functionalities and Applications: A Radical Review

Hexaware E-book on Predictive Analytics

Digging for Gold: Business Usage for Data Mining Kim Foster, CoreTech Consulting Group, Inc., King of Prussia, PA

Data Mining Techniques

ECLT 5810 E-Commerce Data Mining Techniques - Introduction. Prof. Wai Lam

Data Mining Part 5. Prediction

A Review of Data Mining Techniques

Big Data. Fast Forward. Putting data to productive use

Potential Value of Data Mining for Customer Relationship Marketing in the Banking Industry

A STUDY OF DATA MINING ACTIVITIES FOR MARKET RESEARCH

TOWARD A DISTRIBUTED DATA MINING SYSTEM FOR TOURISM INDUSTRY

Data Mining Algorithms Part 1. Dejan Sarka

Data Mining Applications in Higher Education

Data Mining Analytics for Business Intelligence and Decision Support

Social Media Mining. Data Mining Essentials

Data Mining: An Introduction

The Data Mining Process

Data Mining. 1 Introduction 2 Data Mining methods. Alfred Holl Data Mining 1

Introduction to Data Mining and Machine Learning Techniques. Iza Moise, Evangelos Pournaras, Dirk Helbing

Predictive Analytics Techniques: What to Use For Your Big Data. March 26, 2014 Fern Halper, PhD

Role of Social Networking in Marketing using Data Mining

SPATIAL DATA CLASSIFICATION AND DATA MINING

Easily Identify Your Best Customers

Introduction. A. Bellaachia Page: 1

Search and Data Mining: Techniques. Applications Anya Yarygina Boris Novikov

Database Marketing, Business Intelligence and Knowledge Discovery

DATA MINING TECHNOLOGY. Keywords: data mining, data warehouse, knowledge discovery, OLAP, OLAM.

Use of Data Mining in Banking

Chapter 7: Data Mining

Data Mining + Business Intelligence. Integration, Design and Implementation

2.1. Data Mining for Biomedical and DNA data analysis

from Larson Text By Susan Miertschin

Chapter 2 Literature Review

Get to Know the IBM SPSS Product Portfolio

STATISTICA. Financial Institutions. Case Study: Credit Scoring. and

OLAP and Data Mining. Data Warehousing and End-User Access Tools. Introducing OLAP. Introducing OLAP

Foundations of Business Intelligence: Databases and Information Management

Machine Learning and Data Mining. Fundamentals, robotics, recognition

An Overview of Knowledge Discovery Database and Data mining Techniques

IBM Cognos Statistics

Data Mining: Introduction

DATA MINING TECHNIQUES FOR CRM

Data Mining Application in Direct Marketing: Identifying Hot Prospects for Banking Product

Gerard Mc Nulty Systems Optimisation Ltd BA.,B.A.I.,C.Eng.,F.I.E.I

An Introduction to Data Mining

Data Mining: Introduction. Lecture Notes for Chapter 1. Slides by Tan, Steinbach, Kumar adapted by Michael Hahsler

What is Customer Relationship Management? Customer Relationship Management Analytics. Customer Life Cycle. Objectives of CRM. Three Types of CRM

DATA MINING AND WAREHOUSING CONCEPTS

Introduction to Data Mining

DATA MINING TECHNIQUES SUPPORT TO KNOWLEGDE OF BUSINESS INTELLIGENT SYSTEM

GETTING AHEAD OF THE COMPETITION WITH DATA MINING

Advanced In-Database Analytics

Course DSS. Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization

Chapter 5. Warehousing, Data Acquisition, Data. Visualization

1 Choosing the right data mining techniques for the job (8 minutes,

A STUDY ON DATA MINING INVESTIGATING ITS METHODS, APPROACHES AND APPLICATIONS

Information Management course

Using reporting and data mining techniques to improve knowledge of subscribers; applications to customer profiling and fraud management

MBA Data Mining & Knowledge Discovery

Application of SAS! Enterprise Miner in Credit Risk Analytics. Presented by Minakshi Srivastava, VP, Bank of America

Turning Data into Actionable Insights: Predictive Analytics with MATLAB WHITE PAPER

Customer Classification And Prediction Based On Data Mining Technique

ISSN: (Online) Volume 3, Issue 4, April 2015 International Journal of Advance Research in Computer Science and Management Studies

Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization

Healthcare Measurement Analysis Using Data mining Techniques

Statistics for BIG data

A Survey on Web Research for Data Mining

CHURN PREDICTION IN MOBILE TELECOM SYSTEM USING DATA MINING TECHNIQUES

Data Mining. Vera Goebel. Department of Informatics, University of Oslo

Nine Common Types of Data Mining Techniques Used in Predictive Analytics

WebFOCUS RStat. RStat. Predict the Future and Make Effective Decisions Today. WebFOCUS RStat

Data Mining for Manufacturing: Preventive Maintenance, Failure Prediction, Quality Control

Working with telecommunications

2015 Workshops for Professors

Data Mining with SAS. Mathias Lanner Copyright 2010 SAS Institute Inc. All rights reserved.

Data Mining from A to Z: Better Insights, New Opportunities WHITE PAPER

Welcome. Data Mining: Updates in Technologies. Xindong Wu. Colorado School of Mines Golden, Colorado 80401, USA

Data Mining Techniques in CRM

What is Data Mining? Data Mining (Knowledge discovery in database) Data mining: Basic steps. Mining tasks. Classification: YES, NO

Advanced analytics at your hands

Predictive Analytics Certificate Program

Azure Machine Learning, SQL Data Mining and R

The Predictive Data Mining Revolution in Scorecards:

Data Warehousing and Data Mining in Business Applications

AMIS 7640 Data Mining for Business Intelligence

Index Contents Page No. Introduction . Data Mining & Knowledge Discovery

Data Mining for Everyone

Data Analytical Framework for Customer Centric Solutions

Transcription:

Data Mining and Knowledge Discovery Generating knowledge from data Knowledge Discovery Data Mining White Paper Organizations collect a vast amount of data in the process of carrying out their day-to-day business. The opening up of the information systems through the World Wide Web has helped organizations to amass huge amounts of data--ranging from interactions to transactions--that was previously not possible or was possible at a high cost for collecting the data. Organizations have perfected various techniques to churn this vast amount of data into reports that provide various facts and figures. This in turn has created an "information overload" rather than help these organizations glean any knowledge from this data. Data mining is the process of discovering and extracting patterns from data. Data mining applies a set of algorithms for the pattern extraction. When these patterns are analyzed with the help of prior knowledge and proper interpretation, the process is called Knowledge Discovery. The field of Knowledge Discovery in Databases (KDD) deals with data mining and interpretation of the data mining results to create knowledge from databases. In this paper we will discuss the KDD process, data mining algorithms, and the benefits of practicing KDD to businesses. Data mining involves determining patterns from, or fitting models to observed data. A typical data mining system may perform one or more of the following tasks: Association: Association is the discovery of correlation between a set of items. The output is often expressed in the form of a rule showing attribute value conditions that occur frequently together. This type of analysis is widely used in analyzing data for direct marketing campaigns and sales catalog designs and many other business decisionmaking processes. Example - An association model might discover that of all Electronic customers under study, the age group of 20-29 (10% of the set), with an income 40-50K buys DVD Players with 80% probability. Classification: Classification analyzes a set of training data and constructs a model for each class based on the features of the data. A decision tree or a set of classification rules is generated. This can further be used for better understanding of each class and classification of future data. Rich classification methods are inherited from the field of machine learning, neural networks, statistics and other fields. 1

Classification has been quite useful in customer segmentation and credit analysis type requirements. Example - A customer can be evaluated to be good or bad risk depending upon the income range, number of years in job and the amount of debt he or she is carrying. Prediction: Classification can be used for predicting the class label of data objects. Prediction also can be used for missing data value prediction. The classification produces the appropriate business rule or a decision tree from which the prediction can be made. Example - A customer's potential expenditure using a credit card can be predicted based on the expenditure distribution of similar customers using that credit card. Usually, genetic algorithms, regression analysis and neural networks are the commonly used techniques for this purpose. Clustering: A cluster is a collection of objects that are similar to one another. Clustering analysis refers to identifying clusters embedded in the data. A good clustering method produces high quality clusters. It means that inter-cluster similarity is low and the intracluster similarity is high. It is very commonly used for customer segmentation and deriving marketing strategies. Example - The customer base may be clustered around certain sets of attributes that uniquely determine the cluster membership. For example, location, income group, age group of the customers. Time-Series Analysis: This analysis is used to find regularities and interesting characteristics of data varying over time. It looks for sequential patterns, periodicity, trends and deviations. Group1 Group 32 + + + Group 2 Cluster Centers Example - The time-series analysis may be used for predicting sales quantities for different SKUs, based on demand pattern, market condition and competitor's performance. Time-Series Analysis: This analysis is used to find regularities and interesting characteristics of data varying over time. It looks for sequential patterns, periodicity, trends and deviations. Example - The time-series analysis may be used for predicting sales quantities for different SKUs, based on demand pattern, market condition and competitor's performance. Example Applications Retail/Marketing Identify buying patterns from customers Associations among customer demographic characteristics Predict response to mailing campaigns Banking Detect patterns of fraudulent credit card use Classifying customers for target marketing. Predict customers likely to change their credit card affiliation Determine credit card spending by customer groups 3 2 TranSys Technologies

Insurance Claims analysis Predict which customers will buy new policies Identify behavior patterns of risky customers Identify fraudulent behavior Telecommunication Call Behavior Analysis Churn Analysis Fraud Detection Call Center Performance e-commerce Recommendation System Website Access Profiling Personalization Clickstream analysis for Web Insurance Process of Data Mining A systematic approach is essential to successful data mining. has effectively used the process model described in this section. It should be noted that the data mining process is not linear. The loops in the process model indicate that the previous one or more steps may be revisited depending upon the result at that step. For example, the results of the data exploration phase may require you to add new data to the database. Usually a number of initial models are built in order to arrive at a satisfactory model. The following is a brief description of the data mining process phases adopted by TranSys in providing the data mining solution: 1. Business Definition Phase Prerequisite to knowledge discovery is to develop a clear understanding of the business environment. This is required in order to appreciate opportunities for improvement and also to prepare the data for mining, or correctly interpret the results. Clear statement of business objectives will make the best use of the data mining effort. TranSys will work with clients to clearly define the business objective. This definition stage will include a way of measuring the results of the data mining project and cost justification. 2. Data Building Phase In this phase the data to be mined is collected in a database. Depending on the amount and complexity of the data, many times, even a flat file or a spreadsheet may be adequate. The required components of data may be sourced from a data warehouse, as they ensure the required cleanliness of the data. Other data from external sources may have to be integrated. TranSys will perform the following tasks in order to achieve the objectives of this phase: Collect the required data Select the subset of data to be mined Assess data quality and if required, cleanse the data Consolidate and integrate the data Load the data mining database The Business Definition Phase and the Results Deployment Phase govern effectiveness of the entire process. 3

Business Definition Phase Data Building Phase Data Exploration Phase Data Preparation Phase Model Building Phase Model Evaluation Phase Results Deployment The Data Mining Process 3. Data Exploration Phase Understanding the data is very important. Graphing and visualization tools are a vital aid in understanding data and preparing data. Data visualization most often provides the "Aha!" leading to new insights and success. Some of the common and very useful graphical displays of data are histograms or box plots that display distributions of values. TranSys will work closely with the functional team members to identify the most important attributes and fields in predicting an outcome and determine which derived values may be useful. They will use visualization, link analysis and other means of exploring data. 4. Data Preparation Phase This is the final phase before building models. It is often good idea to sample the data when the database is large. If done carefully, this yields no loss of information. Data that is clearly extraneous need to be identified and discarded. It is often necessary to construct new variables derived from the raw data. For example, forecasting credit risk using a debtto-income ratio rather than just debt and income as predictor variable may yield more accurate results that are also easier to understand. Data may also need to be selectively segregated (discretized), for example decision trees used for classification require continuous data such as income to be grouped in ranges or bins - High, Medium and Low or given ranges. The cut-off points for the bins may change the outcome of a model. 4 TranSys Technologies

TranSys will perform the following tasks involved in this phase: Selection of variables Selection of rows Construction of new variables Transformation of variables 5. Model Building Phase Model building is an iterative process. One needs to explore alternative models to find the most useful in solving the business problem. Once you have decided on the type of prediction you want to make, you must choose a model type for making the prediction. This could be a decision tree, a neural net, a proprietary method, or logistic regression. Based upon the results of building initial models, you may want to build another model using the same technique but different parameters. No tool or technique is perfect for all data, and it is difficult, if not impossible, to be sure before you start which technique will work the best. Quintegra chooses the strategy of building numerous models before finding a satisfactory one, which will provide accurate results for the purpose. 6. Model Evaluation Phase After building a model, you must evaluate its results. It is important to test the model in the real world. There is no guarantee that an accurate model reflects the real world. A valid model is not necessarily a correct model. In addition, the data used to build the model may fail to match the real world in some unknown way, leading to an incorrect model. For example, if a model is used to select a subset of a mailing list, do a test mailing to verify the model. If a model is used to predict credit risk, try the model on a small set of applicants before full deployment. TranSys analyzes the risk associated with an incorrect model first. The higher the risk associated with an incorrect model, the more importance is given to construct an experiment to check the model results. 7. Results Deployment Phase Once a data mining model is built and validated, the deployment of the results to applications within the enterprise may be done. For example, the clusters identified by the model can be used to extract the rules that define the model and make recommendations for the new observations. TranSys may aid in developing the application where the model is embedded into the application. For example, the business rules component (out of the model) can be integrated with a loan application system to facilitate evaluation of an applicant. Knowledge Discovery in Databases (KDD) The KDD process extends the data mining to consolidate the discovered knowledge, and then incorporating this knowledge into the operational systems. The knowledge integration has been achieved to a fairly large extent in the e-business domains compared to other domains. For example, the discovered knowledge relating to the behavior of the customers (essentially from the click-stream data) has been effectively used to improve the sites, personalize the pages, improve the promotional and other features, and enhanced the buying experience. 5

The KDD process extends the data mining to consolidate the discovered knowledge, and then incorporating this knowledge into the operational systems. The value of the discovered knowledge lies in its appropriate use. Focus should be on the utilization of data and knowledge for strategic use that can provide a competitive edge. TranSys, with its domain experience and practice excellence, can help clients to minimize the risks of running the KDD processes. Data 6 TranSys Technologies