Data Mining Application in Direct Marketing: Identifying Hot Prospects for Banking Product

Size: px
Start display at page:

Download "Data Mining Application in Direct Marketing: Identifying Hot Prospects for Banking Product"

Transcription

1 Data Mining Application in Direct Marketing: Identifying Hot Prospects for Banking Product Sagarika Prusty Web Data Mining (ECT 584),Spring 2013 DePaul University,Chicago Keywords: Data Mining, Direct Marketing,Clustering, Naïve Bayes, Decision Tree, Unbalanced data Abstract: Direct Marketing is a process of advertising in which businesses send out promotional offers directly addressed to a customer. Success of this type of campaign is measured as a percentage of customers who positively respond to the campaign. Direct Marketing is increasingly used in Banks, Insurance companies and retail industry. Success rates of these campaigns are normally less than 10%. Data mining can help industries improve the success rate significantly by identifying customers who are most likely to buy the products. Companies can then target their campaigns towards those hot prospects alone. This will lead to a significant reduction in marketing cost and increase the RoI (Return on Investment). In this paper we will apply some of the data mining techniques to a banking dataset and illustrate how data mining can help the bank improve its direct marketing effort. 1. Introduction: Direct marketing is practiced by businesses of all sizes from the smallest start-up to the leaders on the Fortune 500. A well-executed direct advertising campaign can prove a positive return on investment by showing how many potential customers responded to a clear call-toaction. Direct marketing is attractive to many marketers because its positive results can be measured directly. For example, if a marketer sends out 1,000 solicitations by mail and 100 respond to the promotion, the marketer can say with confidence that campaign led directly to Sagarika Prusty Page 1

2 10% direct responses. This metric is known as the 'response rate and it is one of many clearly quantifiable success metricsemployed by direct marketers. In contrast, general advertising uses indirect measurements, such as awareness or engagement, since there is no direct response from a consumer. Measurement of results is a fundamental element in successful direct marketing. Predictive modeling and other data mining techniques can help marketers improve the response rate significantly. For example suppose a company has a marketing budget of sending promotional offers to 1000 prospective customers. The company can get much higher return on Investment by sending promotional offers only to the top 1000 customers who are more likely to buy the product than selecting a random base of 1000 customers. Data mining techniques can help marketers identify those top 1000 hot prospects. Data mining tools like cluster analysis can also help marketers group their customers into different clusters or segments and then address their needs accordingly. 2. KDD Process in Data mining: KDD stands for Knowledge Discovery in Databases and refers to the broad process of discovering useful information from datasets. KDD process is often used interchangeably with data mining but actually data mining is a part of the KDD process. It is a systematic approach and can broadly be divided into 5-6 steps. The process starts with the understanding of the business objective and goals of the project. Then comes the dataset identification process where you select the target that needs to be analyzed. More than often, the dataset available is in raw format and needs to be pre-processed or cleaned. This step is called data preprocessing. Data preprocessing takes maximum amount of time in the entire KDD process and if done properly will make the rest of the steps easier. Sometimes data also needs to be transformed before a particular data mining technique can be applied transformation would typically involve either discretizing the numeric attributes, recoding some of the attributes or oversampling or reducing sample size of the data. Once the data is preprocessed and transformed, it is ready for data modeling. Depending on the need of the problem, few techniques or algorithms that are shortlisted and then applied to the dataset. This step is the data modeling step. In this step data miners will apply different techniques and look for patterns and useful information from the data. Sometimes more than one technique is applied and then the best approach is selected. The best one can either be Sagarika Prusty Page 2

3 one of the established techniques or a hybrid approach. The one approach which gives the best result is then selected as the final model. The final step is the interpretation step in which the information discovered in the data mining step is presented in a format that can be understood by the end user. In this paper we will follow the KDD process. But the data used for this paper is already preprocessed and cleaned. Hence minimum effort is required in preprocessing step. The data can be straight way used for data modeling. The KDD process can be represented throug this simple diagram. Fig. 1(Overview of Web Mining and E-Commerce Data Analytics by BamshadMobasher,DePaul University) 3. Bank Direct Marketing Data 3.1 Source of the data The dataset used for this paper is from a direct marketing campaign of a Portuguese bank for one of its term deposit products. The primary means of campaign was through phone calls to its existing customers. Often, more than one phone call was required to asses if customer is subscribing for the product or not. Data is available in public domain and can be downloaded from The full dataset was described and analyzed in:s. Moro, Sagarika Prusty Page 3

4 R. Laureano and P. Cortez. Using Data Mining for Bank Direct Marketing: An Application of the CRISP-DM Methodology. In P. Novais et al. (Eds.), Proceedings of the European Simulation and Modelling Conference - ESM'2011, pp , Guimarães, Portugal, October, EUROSIS. 3.2 Understanding the dataset The dataset available has instances. For the purpose of my project, I randomly split the data in to two parts. First dataset was the bigger one and contained instances. This dataset has been for training the model. The second dataset with much fewer instances (4524) has been used as the test data set for validating the model performance. 3.3 Attribute Information The dataset used is related to 17 campaigns that occurred between May 2008 and November During these phone campaigns, customers of the banks were offered a long-term deposit application, with an attractive interest rate. For each contact, a large no. of attributes was recorded and the output variable was whether the customer accepted the offer or not (Yes indicating that customer accepted the offer and No indicating a negative response). Demographic details of each customer were then added to the campaign related data. The dataset which was available has already been preprocessed. Rows and columns having missing values are already cleaned, only significant attributes are present (totaling to 17 including the output variable. List of the attributes is as follows: Attribute name Description Value age Age(N) Numeric Technician,Management,Student,Maid,Retired job Job (C) etc. marital Marital Status(C) Single,Married education Education ( C ) Primary,Secondray,Tertiary default Credit in default? (B) Yes or No balance Average Yearly balance(euros) (N) Numeric housing Has housing loan? (B) Yes or No Sagarika Prusty Page 4

5 loan Has personal loan? (B) Yes or No contact Contact communication type ( C ) Phone,Mobile,Unknown day Last contact day of the month (N ) 1,2,3,4,.,29,30,31 month Last contact month of the year(n ) Jan,Feb,Mar,.Nov,Dec duration last contact duration, in seconds (N) Numeric campaign Number of contacts during the campaign(n) Numeric pdays No. of days passed since last contact(n) Numeric poutcome Outcome of the previous marketing campaign (B) Yes or No N=Numeric C=Categorical B=Binary The output variable, as mentioned above is whether the customer subscribed for the term deposit product or not (Yes or No). Of the instances in the training dataset only 4763 instances has the output variable(y) as Yes. This translates to 10.5 % response rate. This data set is an unbalanced dataset since the variable that needs to be classified has a very unbalanced distribution. (89.5% No cases and 10.5% yes cases). This is an important factor that needs to be considered while performing data modeling. 4 Experimental Environment The software used for data modeling in this project is Weka. Weka is am open source data mining tool and can be downloaded for free. 5 Data Modeling The idea is to experiment with different Machine learning algorithms and select the one giving the best results. 5.1 Model Evaluation criteria For model evaluation, I will be using three different approaches(all complementing each other). First one is F measure which is a weighted average of precision and recall and is represented by the formula 1 below. An F measure close to 1 indicates a good model. Sagarika Prusty Page 5

6 F Measure Formula 1 Where TP True Positive (Actually yes and classified as yes ) We want a high TP rate FP False Positive (Actually no but classified as yes ) FN False Negative (Actually yes but classified as no ) Lift Chart Second measure for evaluating model performance is to create a cumulative lift chart and check for the lift value and a sampling level. Lift is represented by the formula 2 below. Lift chart is created by calculating the cumulative positive response (actual response) for the hot prospects (data sorted based on the predicted response-all yes cases on top). Lift= Response rate based on Predicted model/response rate based on random calling - Formula 2 A higher lift value will indicate a better performance. Validation using test data set Third one is how good the model performed with the test data set. This can be measured by measuring the TPR rate of class yes in the test set i.e how many of the actual yes cases were classified as yes by the classifier. The higher the better. I experimented with four models. The results are described below: 5.2 Models Naïve Bayes: I started with Naïve Bayes model which is an old method of classification and predictor selection and is known for its simplicity and stability. I performed a 10 fold cross validation on the instances available in the training data set. The model had a decent accuracy level of 88% but Sagarika Prusty Page 6

7 had a True Positive Rate(TPR) of only 0.53 (F measure of 0.508) for yes cases. This was not a very good result. But I went ahead to check how the model performs with the test dataset (containing 4524 cases). It failed miserably. It classified all the test cases as No. This is because the data set is highly unbalanced and has only 10% of yes cases. (Resolving the case of unbalanced data) To resolve the issue of unbalanced dataset, I reduced the sample size of No cases in the training dataset. I randomly selected 4763 No cases and deleted rest of the No cases from the training dataset. Now, my training dataset had equal number of Yes and No cases and is perfectly balanced. Naïve Bayes with modified training set After modifying the training dataset, I once again ran the Naïve Bayes algorithm and this time result was much better. The summary of Stratified Cross- validation is shown in fig.2 below. Even though the overall accuracy of the model reduced to 78%, TPR and F measure for Yes cases improved significantly. When validated on the test cases, the model was able to classify 419 of the 526 yes cases which translate to a TPR of 80%. Fig. 2 Sagarika Prusty Page 7

8 Decision Tree (C4.5 algorithm) Next, I wanted to check how decision tree handles the unbalanced data. So, I ran the J438 method (equivalent to C4.5 algorithm) in Weka. The summary of the model is shown below in fig3. Overall accuracy of the model is 94% with a decent TPR of and F measure of ROC is 0.94 When tested on the test data set, the model performed much better than the Naïve Bayes model and was able to classify 252 of the total 526 yes cases ( 48%). But, the result was not as satisfactory as Naïve Bayes on the modified training set. Decision Tree (C4.5 algorithm) with modified training set Since decision tree gave better result than Naïve Bayes for the original training set, as a final trial, I experimented with the modified training dataset and applied C4.5 algorithm to it. The summary result is as shown in fig.3 below. Of the four models I experimented with, this one gave the best result. Overall accuracy stands at 90%. Recall and Precision for yes cases are the best of the four models. And F measure for yes class is 0.9. When validated with the test data set, it was able to classify 464 out of 526 yes cases (which is 88%) Fig. 3 Sagarika Prusty Page 8

9 5.3 Model comparison Since the Naïve Bayes with the unbalanced training data set didn t produce desired result, that model has been left out from the comparison chart. Rest three models are compared below: Precision, Recall, F Measure,ROC Area, Performance with test data Model / Parameter Naïve Bayes(Modified Training set) Decision Tree (C4.5) Overall Accuracy 78.15% 93.7% 89.9% Recall (Class Yes) Precision ( Class Yes) F Measure ( Class YeS) ROC Area TPR of test set 80% 48% 88% Decision Tree(Modified training set) Except for Overall Accuracy rate (which is anyways not a very good indicator of model performance), Decision tree with the balanced training dataset has outperformed the other two models. Cumulative Lift Chart Fig.4 below shows the cumulative lift chart for all the three models. Decision tree with the modified training set gives the best result throughout ( except at the lower left level sample less than 379 where the unbalanced dataset gave better response rate). Calculating the lift value when only 30% of customers are to be called: 30% of the total instances in test set= 0.3*4524=1357 Lift = Response rate (using top 30% customers)/response rate (random calling) Random Calling Naïve Bayes(Modified dataset) Decision Tree Decision Tree(Modified dataset) No of yes cases Response Rate 10.6% 30.4% 22.4% 34.4% Lift Sagarika Prusty Page 9

10 Fig.4 6 Clustering Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called cluster) are more similar (in some sense or another) to each other than to those in other groups (clusters). Clustering can be informative and can be used very effectively in Direct Marketing. It can identify the characteristics of customers who are more likely to subscribe or buy a product and companies can leverage this information to customize products for different customer segments. 6.1 K means Clustering: Since, the balanced dataset gave better results for the classification cation exercise, I used the same dataset for clustering as well. I applied K means clustering technique in Weka and experimented with different no. of clusters and got some good results with 8 clusters. Result is shown below in fig 5. Sagarika Prusty Page 10

11 Fig Interpretation of Clustering Analysis Clusters with y as yes represent the groups of customers who subscribed for the term deposit product and those with y as no are the ones who didn t subscribe for the product. From the result above, we can see that cluster 0, cluster 2, cluster 3 and cluster 4 represent customer who subscribed for the term deposit product offered by the bank. If we read the some of the characteristics of customers in cluster 4, they are the ones with management jobs, mean age of 41.7 yrs, are married, have tertiary education, have an average balance of 1952 Euros and have subscribed for the term deposit product. Cluster 3 and Cluster 5 looks very similar representing customer with admin jobs, married, have secondary education. But cluster 3 has customer who subscribed for term deposit and cluster 5 represent customers who didn t subscribe for the product. If we look closely, we see that in cluster 3, the average duration of the campaign call is 435 secs as against 295 secs in cluster 5. This indicates that may be customers with higher call duration enquire about the product in detail as they are interested in the product. Similarly pdays(number of days that passed by after the client was last contacted from a previous campaign)for customers in cluster 3 is 47.6 as against 12.7 in cluster. Low average pdays is possibly because customers who were not contacted before have a value of -1 in the data. Similarly, other clusters can be analyzed for specific characteristics. Sagarika Prusty Page 11

12 7 Conclusion Using the banking dataset we clearly saw the usage of data mining tools in Direct Marketing and how data mining techniques can help companies get better Return on Investment of their marketing budget. Algorithms like Naïve Bayes and Decision trees can help classify customers as good or bad customers (based on their propensity to buy a product) and clustering techniques can help in segmenting customers and identifying characteristics or attributes of good customers. There are lots of other techniques like association rule and other classifier algorithms that have not been discussed in this paper. Depending on the business objective appropriate tool needs to be selected and different models should be tried and tested to come out with the best performing model. That model can then be applied to the direct marketing data for best results. 8 Citation [pdf] [bib] Moro et al., 2011] S. Moro, R. Laureano and P. Cortez.Using Data Mining for Bank Direct Marketing: An Application of the CRISP-DM Methodology. In P. Novais et al. (Eds.), Proceedings of the European Simulation and Modelling Conference - ESM'2011, pp , Guimarães, Portugal, October, EUROSIS. Overview of Web Mining and E-Commerce Data Analytics by BamshadMobasher Sagarika Prusty Page 12

Role of Customer Response Models in Customer Solicitation Center s Direct Marketing Campaign

Role of Customer Response Models in Customer Solicitation Center s Direct Marketing Campaign Role of Customer Response Models in Customer Solicitation Center s Direct Marketing Campaign Arun K Mandapaka, Amit Singh Kushwah, Dr.Goutam Chakraborty Oklahoma State University, OK, USA ABSTRACT Direct

More information

UNDERSTANDING THE EFFECTIVENESS OF BANK DIRECT MARKETING Tarun Gupta, Tong Xia and Diana Lee

UNDERSTANDING THE EFFECTIVENESS OF BANK DIRECT MARKETING Tarun Gupta, Tong Xia and Diana Lee UNDERSTANDING THE EFFECTIVENESS OF BANK DIRECT MARKETING Tarun Gupta, Tong Xia and Diana Lee 1. Introduction There are two main approaches for companies to promote their products / services: through mass

More information

Applied Data Mining Analysis: A Step-by-Step Introduction Using Real-World Data Sets

Applied Data Mining Analysis: A Step-by-Step Introduction Using Real-World Data Sets Applied Data Mining Analysis: A Step-by-Step Introduction Using Real-World Data Sets http://info.salford-systems.com/jsm-2015-ctw August 2015 Salford Systems Course Outline Demonstration of two classification

More information

MAXIMIZING RETURN ON DIRECT MARKETING CAMPAIGNS

MAXIMIZING RETURN ON DIRECT MARKETING CAMPAIGNS MAXIMIZING RETURN ON DIRET MARKETING AMPAIGNS IN OMMERIAL BANKING S 229 Project: Final Report Oleksandra Onosova INTRODUTION Recent innovations in cloud computing and unified communications have made a

More information

not possible or was possible at a high cost for collecting the data.

not possible or was possible at a high cost for collecting the data. Data Mining and Knowledge Discovery Generating knowledge from data Knowledge Discovery Data Mining White Paper Organizations collect a vast amount of data in the process of carrying out their day-to-day

More information

Evaluation and Comparison of Data Mining Techniques Over Bank Direct Marketing

Evaluation and Comparison of Data Mining Techniques Over Bank Direct Marketing Evaluation and Comparison of Data Mining Techniques Over Bank Direct Marketing Niharika Sharma 1, Arvinder Kaur 2, Sheetal Gandotra 3, Dr Bhawna Sharma 4 B.E. Final Year Student, Department of Computer

More information

Chapter 6. The stacking ensemble approach

Chapter 6. The stacking ensemble approach 82 This chapter proposes the stacking ensemble approach for combining different data mining classifiers to get better performance. Other combination techniques like voting, bagging etc are also described

More information

Data Mining Algorithms Part 1. Dejan Sarka

Data Mining Algorithms Part 1. Dejan Sarka Data Mining Algorithms Part 1 Dejan Sarka Join the conversation on Twitter: @DevWeek #DW2015 Instructor Bio Dejan Sarka (dsarka@solidq.com) 30 years of experience SQL Server MVP, MCT, 13 books 7+ courses

More information

Decision Support System For A Customer Relationship Management Case Study

Decision Support System For A Customer Relationship Management Case Study 61 Decision Support System For A Customer Relationship Management Case Study Ozge Kart 1, Alp Kut 1, and Vladimir Radevski 2 1 Dokuz Eylul University, Izmir, Turkey {ozge, alp}@cs.deu.edu.tr 2 SEE University,

More information

USING DATA MINING FOR BANK DIRECT MARKETING: AN APPLICATION OF THE CRISP-DM METHODOLOGY

USING DATA MINING FOR BANK DIRECT MARKETING: AN APPLICATION OF THE CRISP-DM METHODOLOGY USING DATA MINING FOR BANK DIRECT MARKETING: AN APPLICATION OF THE CRISP-DM METHODOLOGY Sérgio Moro and Raul M. S. Laureano Instituto Universitário de Lisboa (ISCTE IUL) Av.ª das Forças Armadas 1649-026

More information

Benchmarking of different classes of models used for credit scoring

Benchmarking of different classes of models used for credit scoring Benchmarking of different classes of models used for credit scoring We use this competition as an opportunity to compare the performance of different classes of predictive models. In particular we want

More information

Data Mining Framework for Direct Marketing: A Case Study of Bank Marketing

Data Mining Framework for Direct Marketing: A Case Study of Bank Marketing www.ijcsi.org 198 Data Mining Framework for Direct Marketing: A Case Study of Bank Marketing Lilian Sing oei 1 and Jiayang Wang 2 1 School of Information Science and Engineering, Central South University

More information

Classification of Titanic Passenger Data and Chances of Surviving the Disaster Data Mining with Weka and Kaggle Competition Data

Classification of Titanic Passenger Data and Chances of Surviving the Disaster Data Mining with Weka and Kaggle Competition Data Proceedings of Student-Faculty Research Day, CSIS, Pace University, May 2 nd, 2014 Classification of Titanic Passenger Data and Chances of Surviving the Disaster Data Mining with Weka and Kaggle Competition

More information

In this presentation, you will be introduced to data mining and the relationship with meaningful use.

In this presentation, you will be introduced to data mining and the relationship with meaningful use. In this presentation, you will be introduced to data mining and the relationship with meaningful use. Data mining refers to the art and science of intelligent data analysis. It is the application of machine

More information

Performance Analysis of Naive Bayes and J48 Classification Algorithm for Data Classification

Performance Analysis of Naive Bayes and J48 Classification Algorithm for Data Classification Performance Analysis of Naive Bayes and J48 Classification Algorithm for Data Classification Tina R. Patil, Mrs. S. S. Sherekar Sant Gadgebaba Amravati University, Amravati tnpatil2@gmail.com, ss_sherekar@rediffmail.com

More information

STATISTICA. Financial Institutions. Case Study: Credit Scoring. and

STATISTICA. Financial Institutions. Case Study: Credit Scoring. and Financial Institutions and STATISTICA Case Study: Credit Scoring STATISTICA Solutions for Business Intelligence, Data Mining, Quality Control, and Web-based Analytics Table of Contents INTRODUCTION: WHAT

More information

Introduction to Data Mining

Introduction to Data Mining Introduction to Data Mining Jay Urbain Credits: Nazli Goharian & David Grossman @ IIT Outline Introduction Data Pre-processing Data Mining Algorithms Naïve Bayes Decision Tree Neural Network Association

More information

Evaluation & Validation: Credibility: Evaluating what has been learned

Evaluation & Validation: Credibility: Evaluating what has been learned Evaluation & Validation: Credibility: Evaluating what has been learned How predictive is a learned model? How can we evaluate a model Test the model Statistical tests Considerations in evaluating a Model

More information

IBM SPSS Direct Marketing 22

IBM SPSS Direct Marketing 22 IBM SPSS Direct Marketing 22 Note Before using this information and the product it supports, read the information in Notices on page 25. Product Information This edition applies to version 22, release

More information

from Larson Text By Susan Miertschin

from Larson Text By Susan Miertschin Decision Tree Data Mining Example from Larson Text By Susan Miertschin 1 Problem The Maximum Miniatures Marketing Department wants to do a targeted mailing gpromoting the Mythic World line of figurines.

More information

Data Mining - The Next Mining Boom?

Data Mining - The Next Mining Boom? Howard Ong Principal Consultant Aurora Consulting Pty Ltd Abstract This paper introduces Data Mining to its audience by explaining Data Mining in the context of Corporate and Business Intelligence Reporting.

More information

APPLICATION OF DATA MINING TECHNIQUES FOR DIRECT MARKETING. Anatoli Nachev

APPLICATION OF DATA MINING TECHNIQUES FOR DIRECT MARKETING. Anatoli Nachev 86 ITHEA APPLICATION OF DATA MINING TECHNIQUES FOR DIRECT MARKETING Anatoli Nachev Abstract: This paper presents a case study of data mining modeling techniques for direct marketing. It focuses to three

More information

Increasing Marketing ROI with Optimized Prediction

Increasing Marketing ROI with Optimized Prediction Increasing Marketing ROI with Optimized Prediction Yottamine s Unique and Powerful Solution Smart marketers are using predictive analytics to make the best offer to the best customer for the least cost.

More information

Data Mining: Overview. What is Data Mining?

Data Mining: Overview. What is Data Mining? Data Mining: Overview What is Data Mining? Recently * coined term for confluence of ideas from statistics and computer science (machine learning and database methods) applied to large databases in science,

More information

IBM SPSS Direct Marketing 23

IBM SPSS Direct Marketing 23 IBM SPSS Direct Marketing 23 Note Before using this information and the product it supports, read the information in Notices on page 25. Product Information This edition applies to version 23, release

More information

Easily Identify the Right Customers

Easily Identify the Right Customers PASW Direct Marketing 18 Specifications Easily Identify the Right Customers You want your marketing programs to be as profitable as possible, and gaining insight into the information contained in your

More information

Data Mining Applications in Higher Education

Data Mining Applications in Higher Education Executive report Data Mining Applications in Higher Education Jing Luan, PhD Chief Planning and Research Officer, Cabrillo College Founder, Knowledge Discovery Laboratories Table of contents Introduction..............................................................2

More information

Hexaware E-book on Predictive Analytics

Hexaware E-book on Predictive Analytics Hexaware E-book on Predictive Analytics Business Intelligence & Analytics Actionable Intelligence Enabled Published on : Feb 7, 2012 Hexaware E-book on Predictive Analytics What is Data mining? Data mining,

More information

Performance Measures in Data Mining

Performance Measures in Data Mining Performance Measures in Data Mining Common Performance Measures used in Data Mining and Machine Learning Approaches L. Richter J.M. Cejuela Department of Computer Science Technische Universität München

More information

Knowledge Discovery and Data Mining

Knowledge Discovery and Data Mining Knowledge Discovery and Data Mining Unit # 11 Sajjad Haider Fall 2013 1 Supervised Learning Process Data Collection/Preparation Data Cleaning Discretization Supervised/Unuspervised Identification of right

More information

Comparing the Results of Support Vector Machines with Traditional Data Mining Algorithms

Comparing the Results of Support Vector Machines with Traditional Data Mining Algorithms Comparing the Results of Support Vector Machines with Traditional Data Mining Algorithms Scott Pion and Lutz Hamel Abstract This paper presents the results of a series of analyses performed on direct mail

More information

Improving Credit Card Fraud Detection with Calibrated Probabilities

Improving Credit Card Fraud Detection with Calibrated Probabilities Improving Credit Card Fraud Detection with Calibrated Probabilities Alejandro Correa Bahnsen, Aleksandar Stojanovic, Djamila Aouada and Björn Ottersten Interdisciplinary Centre for Security, Reliability

More information

A Regression Approach for Forecasting Vendor Revenue in Telecommunication Industries

A Regression Approach for Forecasting Vendor Revenue in Telecommunication Industries A Regression Approach for Forecasting Vendor Revenue in Telecommunication Industries Aida Mustapha *1, Farhana M. Fadzil #2 * Faculty of Computer Science and Information Technology, Universiti Tun Hussein

More information

Using reporting and data mining techniques to improve knowledge of subscribers; applications to customer profiling and fraud management

Using reporting and data mining techniques to improve knowledge of subscribers; applications to customer profiling and fraud management Using reporting and data mining techniques to improve knowledge of subscribers; applications to customer profiling and fraud management Paper Jean-Louis Amat Abstract One of the main issues of operators

More information

Predicting the Risk of Heart Attacks using Neural Network and Decision Tree

Predicting the Risk of Heart Attacks using Neural Network and Decision Tree Predicting the Risk of Heart Attacks using Neural Network and Decision Tree S.Florence 1, N.G.Bhuvaneswari Amma 2, G.Annapoorani 3, K.Malathi 4 PG Scholar, Indian Institute of Information Technology, Srirangam,

More information

Return on Investment from Inbound Marketing through Implementing HubSpot Software

Return on Investment from Inbound Marketing through Implementing HubSpot Software Return on Investment from Inbound Marketing through Implementing HubSpot Software January 2010 Prepared By: Melissa DiBella MBA Class of 2010 MIT Sloan School of Management Massachusetts Institute of Technology

More information

Insurance Analytics - analýza dat a prediktivní modelování v pojišťovnictví. Pavel Kříž. Seminář z aktuárských věd MFF 4.

Insurance Analytics - analýza dat a prediktivní modelování v pojišťovnictví. Pavel Kříž. Seminář z aktuárských věd MFF 4. Insurance Analytics - analýza dat a prediktivní modelování v pojišťovnictví Pavel Kříž Seminář z aktuárských věd MFF 4. dubna 2014 Summary 1. Application areas of Insurance Analytics 2. Insurance Analytics

More information

Experiments in Web Page Classification for Semantic Web

Experiments in Web Page Classification for Semantic Web Experiments in Web Page Classification for Semantic Web Asad Satti, Nick Cercone, Vlado Kešelj Faculty of Computer Science, Dalhousie University E-mail: {rashid,nick,vlado}@cs.dal.ca Abstract We address

More information

Artificial Neural Network, Decision Tree and Statistical Techniques Applied for Designing and Developing E-mail Classifier

Artificial Neural Network, Decision Tree and Statistical Techniques Applied for Designing and Developing E-mail Classifier International Journal of Recent Technology and Engineering (IJRTE) ISSN: 2277-3878, Volume-1, Issue-6, January 2013 Artificial Neural Network, Decision Tree and Statistical Techniques Applied for Designing

More information

DATA MINING TECHNIQUES AND APPLICATIONS

DATA MINING TECHNIQUES AND APPLICATIONS DATA MINING TECHNIQUES AND APPLICATIONS Mrs. Bharati M. Ramageri, Lecturer Modern Institute of Information Technology and Research, Department of Computer Application, Yamunanagar, Nigdi Pune, Maharashtra,

More information

Easily Identify Your Best Customers

Easily Identify Your Best Customers IBM SPSS Statistics Easily Identify Your Best Customers Use IBM SPSS predictive analytics software to gain insight from your customer database Contents: 1 Introduction 2 Exploring customer data Where do

More information

Data mining and statistical models in marketing campaigns of BT Retail

Data mining and statistical models in marketing campaigns of BT Retail Data mining and statistical models in marketing campaigns of BT Retail Francesco Vivarelli and Martyn Johnson Database Exploitation, Segmentation and Targeting group BT Retail Pp501 Holborn centre 120

More information

What is Customer Relationship Management? Customer Relationship Management Analytics. Customer Life Cycle. Objectives of CRM. Three Types of CRM

What is Customer Relationship Management? Customer Relationship Management Analytics. Customer Life Cycle. Objectives of CRM. Three Types of CRM Relationship Management Analytics What is Relationship Management? CRM is a strategy which utilises a combination of Week 13: Summary information technology policies processes, employees to develop profitable

More information

ON INTEGRATING UNSUPERVISED AND SUPERVISED CLASSIFICATION FOR CREDIT RISK EVALUATION

ON INTEGRATING UNSUPERVISED AND SUPERVISED CLASSIFICATION FOR CREDIT RISK EVALUATION ISSN 9 X INFORMATION TECHNOLOGY AND CONTROL, 00, Vol., No.A ON INTEGRATING UNSUPERVISED AND SUPERVISED CLASSIFICATION FOR CREDIT RISK EVALUATION Danuta Zakrzewska Institute of Computer Science, Technical

More information

Data Mining in CRM & Direct Marketing. Jun Du The University of Western Ontario jdu43@uwo.ca

Data Mining in CRM & Direct Marketing. Jun Du The University of Western Ontario jdu43@uwo.ca Data Mining in CRM & Direct Marketing Jun Du The University of Western Ontario jdu43@uwo.ca Outline Why CRM & Marketing Goals in CRM & Marketing Models and Methodologies Case Study: Response Model Case

More information

Social Media Mining. Data Mining Essentials

Social Media Mining. Data Mining Essentials Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers

More information

Keywords Data mining, Classification Algorithm, Decision tree, J48, Random forest, Random tree, LMT, WEKA 3.7. Fig.1. Data mining techniques.

Keywords Data mining, Classification Algorithm, Decision tree, J48, Random forest, Random tree, LMT, WEKA 3.7. Fig.1. Data mining techniques. International Journal of Emerging Research in Management &Technology Research Article October 2015 Comparative Study of Various Decision Tree Classification Algorithm Using WEKA Purva Sewaiwar, Kamal Kant

More information

Data Mining Solutions for the Business Environment

Data Mining Solutions for the Business Environment Database Systems Journal vol. IV, no. 4/2013 21 Data Mining Solutions for the Business Environment Ruxandra PETRE University of Economic Studies, Bucharest, Romania ruxandra_stefania.petre@yahoo.com Over

More information

Overview. Evaluation Connectionist and Statistical Language Processing. Test and Validation Set. Training and Test Set

Overview. Evaluation Connectionist and Statistical Language Processing. Test and Validation Set. Training and Test Set Overview Evaluation Connectionist and Statistical Language Processing Frank Keller keller@coli.uni-sb.de Computerlinguistik Universität des Saarlandes training set, validation set, test set holdout, stratification

More information

Knowledge Discovery and Data Mining

Knowledge Discovery and Data Mining Knowledge Discovery and Data Mining Lecture 15 - ROC, AUC & Lift Tom Kelsey School of Computer Science University of St Andrews http://tom.home.cs.st-andrews.ac.uk twk@st-andrews.ac.uk Tom Kelsey ID5059-17-AUC

More information

1 Choosing the right data mining techniques for the job (8 minutes,

1 Choosing the right data mining techniques for the job (8 minutes, CS490D Spring 2004 Final Solutions, May 3, 2004 Prof. Chris Clifton Time will be tight. If you spend more than the recommended time on any question, go on to the next one. If you can t answer it in the

More information

Pentaho Data Mining Last Modified on January 22, 2007

Pentaho Data Mining Last Modified on January 22, 2007 Pentaho Data Mining Copyright 2007 Pentaho Corporation. Redistribution permitted. All trademarks are the property of their respective owners. For the latest information, please visit our web site at www.pentaho.org

More information

Determining optimum insurance product portfolio through predictive analytics BADM Final Project Report

Determining optimum insurance product portfolio through predictive analytics BADM Final Project Report 2012 Determining optimum insurance product portfolio through predictive analytics BADM Final Project Report Dinesh Ganti(61310071), Gauri Singh(61310560), Ravi Shankar(61310210), Shouri Kamtala(61310215),

More information

DATA MINING TOOL FOR INTEGRATED COMPLAINT MANAGEMENT SYSTEM WEKA 3.6.7

DATA MINING TOOL FOR INTEGRATED COMPLAINT MANAGEMENT SYSTEM WEKA 3.6.7 DATA MINING TOOL FOR INTEGRATED COMPLAINT MANAGEMENT SYSTEM WEKA 3.6.7 UNDER THE GUIDANCE Dr. N.P. DHAVALE, DGM, INFINET Department SUBMITTED TO INSTITUTE FOR DEVELOPMENT AND RESEARCH IN BANKING TECHNOLOGY

More information

Direct Marketing When There Are Voluntary Buyers

Direct Marketing When There Are Voluntary Buyers Direct Marketing When There Are Voluntary Buyers Yi-Ting Lai and Ke Wang Simon Fraser University {llai2, wangk}@cs.sfu.ca Daymond Ling, Hua Shi, and Jason Zhang Canadian Imperial Bank of Commerce {Daymond.Ling,

More information

Data Mining Practical Machine Learning Tools and Techniques

Data Mining Practical Machine Learning Tools and Techniques Credibility: Evaluating what s been learned Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 5 of Data Mining by I. H. Witten, E. Frank and M. A. Hall Issues: training, testing,

More information

Active Learning SVM for Blogs recommendation

Active Learning SVM for Blogs recommendation Active Learning SVM for Blogs recommendation Xin Guan Computer Science, George Mason University Ⅰ.Introduction In the DH Now website, they try to review a big amount of blogs and articles and find the

More information

Targeted Marketing, KDD Cup and Customer Modeling

Targeted Marketing, KDD Cup and Customer Modeling Targeted Marketing, KDD Cup and Customer Modeling Outline Direct Marketing Review: Evaluation: Lift, Gains KDD Cup 1997 Lift and Benefit estimation Privacy and Data Mining 2 Direct Marketing Paradigm Find

More information

Mining the Software Change Repository of a Legacy Telephony System

Mining the Software Change Repository of a Legacy Telephony System Mining the Software Change Repository of a Legacy Telephony System Jelber Sayyad Shirabad, Timothy C. Lethbridge, Stan Matwin School of Information Technology and Engineering University of Ottawa, Ottawa,

More information

IBMMS DECISION SUPPORT TOOL FOR MANAGEMENT OF BANK TELEMARKETING CAMPAIGNS

IBMMS DECISION SUPPORT TOOL FOR MANAGEMENT OF BANK TELEMARKETING CAMPAIGNS IBMMS DECISION SUPPORT TOOL FOR MANAGEMENT OF BANK TELEMARKETING CAMPAIGNS Ali KELES and Ayturk KELES Department of Computer Education and Instructional Technology, Faculty of Education Agri Ibrahim Cecen

More information

Predicting earning potential on Adult Dataset

Predicting earning potential on Adult Dataset MSc in Computing, Business Intelligence and Data Mining stream. Business Intelligence and Data Mining Applications Project Report. Predicting earning potential on Adult Dataset Submitted by: xxxxxxx Supervisor:

More information

Practical Data Science with Azure Machine Learning, SQL Data Mining, and R

Practical Data Science with Azure Machine Learning, SQL Data Mining, and R Practical Data Science with Azure Machine Learning, SQL Data Mining, and R Overview This 4-day class is the first of the two data science courses taught by Rafal Lukawiecki. Some of the topics will be

More information

Comparison of Data Mining Techniques used for Financial Data Analysis

Comparison of Data Mining Techniques used for Financial Data Analysis Comparison of Data Mining Techniques used for Financial Data Analysis Abhijit A. Sawant 1, P. M. Chawan 2 1 Student, 2 Associate Professor, Department of Computer Technology, VJTI, Mumbai, INDIA Abstract

More information

FRAUD DETECTION IN ELECTRIC POWER DISTRIBUTION NETWORKS USING AN ANN-BASED KNOWLEDGE-DISCOVERY PROCESS

FRAUD DETECTION IN ELECTRIC POWER DISTRIBUTION NETWORKS USING AN ANN-BASED KNOWLEDGE-DISCOVERY PROCESS FRAUD DETECTION IN ELECTRIC POWER DISTRIBUTION NETWORKS USING AN ANN-BASED KNOWLEDGE-DISCOVERY PROCESS Breno C. Costa, Bruno. L. A. Alberto, André M. Portela, W. Maduro, Esdras O. Eler PDITec, Belo Horizonte,

More information

Extension of Decision Tree Algorithm for Stream Data Mining Using Real Data

Extension of Decision Tree Algorithm for Stream Data Mining Using Real Data Fifth International Workshop on Computational Intelligence & Applications IEEE SMC Hiroshima Chapter, Hiroshima University, Japan, November 10, 11 & 12, 2009 Extension of Decision Tree Algorithm for Stream

More information

How To Understand The Impact Of A Computer On Organization

How To Understand The Impact Of A Computer On Organization International Journal of Research in Engineering & Technology (IJRET) Vol. 1, Issue 1, June 2013, 1-6 Impact Journals IMPACT OF COMPUTER ON ORGANIZATION A. D. BHOSALE 1 & MARATHE DAGADU MITHARAM 2 1 Department

More information

CSC 177 Fall 2014 Team Project Final Report

CSC 177 Fall 2014 Team Project Final Report CSC 177 Fall 2014 Team Project Final Report Project Title, Data Mining on Farmers Market Data Instructor: Dr. Meiliu Lu Team Members: Yogesh Isawe Kalindi Mehta Aditi Kulkarni CSc 177 DM Project Cover

More information

T-61.3050 : Email Classification as Spam or Ham using Naive Bayes Classifier. Santosh Tirunagari : 245577

T-61.3050 : Email Classification as Spam or Ham using Naive Bayes Classifier. Santosh Tirunagari : 245577 T-61.3050 : Email Classification as Spam or Ham using Naive Bayes Classifier Santosh Tirunagari : 245577 January 20, 2011 Abstract This term project gives a solution how to classify an email as spam or

More information

Azure Machine Learning, SQL Data Mining and R

Azure Machine Learning, SQL Data Mining and R Azure Machine Learning, SQL Data Mining and R Day-by-day Agenda Prerequisites No formal prerequisites. Basic knowledge of SQL Server Data Tools, Excel and any analytical experience helps. Best of all:

More information

IBM SPSS Direct Marketing

IBM SPSS Direct Marketing IBM Software IBM SPSS Statistics 19 IBM SPSS Direct Marketing Understand your customers and improve marketing campaigns Highlights With IBM SPSS Direct Marketing, you can: Understand your customers in

More information

Knowledge Discovery and Data Mining

Knowledge Discovery and Data Mining Knowledge Discovery and Data Mining Unit # 6 Sajjad Haider Fall 2014 1 Evaluating the Accuracy of a Classifier Holdout, random subsampling, crossvalidation, and the bootstrap are common techniques for

More information

Predicting borrowers chance of defaulting on credit loans

Predicting borrowers chance of defaulting on credit loans Predicting borrowers chance of defaulting on credit loans Junjie Liang (junjie87@stanford.edu) Abstract Credit score prediction is of great interests to banks as the outcome of the prediction algorithm

More information

Getting Even More Out of Ensemble Selection

Getting Even More Out of Ensemble Selection Getting Even More Out of Ensemble Selection Quan Sun Department of Computer Science The University of Waikato Hamilton, New Zealand qs12@cs.waikato.ac.nz ABSTRACT Ensemble Selection uses forward stepwise

More information

Predicting Flight Delays

Predicting Flight Delays Predicting Flight Delays Dieterich Lawson jdlawson@stanford.edu William Castillo will.castillo@stanford.edu Introduction Every year approximately 20% of airline flights are delayed or cancelled, costing

More information

A Logistic Regression Approach to Ad Click Prediction

A Logistic Regression Approach to Ad Click Prediction A Logistic Regression Approach to Ad Click Prediction Gouthami Kondakindi kondakin@usc.edu Satakshi Rana satakshr@usc.edu Aswin Rajkumar aswinraj@usc.edu Sai Kaushik Ponnekanti ponnekan@usc.edu Vinit Parakh

More information

A Property & Casualty Insurance Predictive Modeling Process in SAS

A Property & Casualty Insurance Predictive Modeling Process in SAS Paper AA-02-2015 A Property & Casualty Insurance Predictive Modeling Process in SAS 1.0 ABSTRACT Mei Najim, Sedgwick Claim Management Services, Chicago, Illinois Predictive analytics has been developing

More information

Data Mining - Evaluation of Classifiers

Data Mining - Evaluation of Classifiers Data Mining - Evaluation of Classifiers Lecturer: JERZY STEFANOWSKI Institute of Computing Sciences Poznan University of Technology Poznan, Poland Lecture 4 SE Master Course 2008/2009 revised for 2010

More information

Microsoft Azure Machine learning Algorithms

Microsoft Azure Machine learning Algorithms Microsoft Azure Machine learning Algorithms Tomaž KAŠTRUN @tomaz_tsql Tomaz.kastrun@gmail.com http://tomaztsql.wordpress.com Our Sponsors Speaker info https://tomaztsql.wordpress.com Agenda Focus on explanation

More information

Data Mining. Nonlinear Classification

Data Mining. Nonlinear Classification Data Mining Unit # 6 Sajjad Haider Fall 2014 1 Nonlinear Classification Classes may not be separable by a linear boundary Suppose we randomly generate a data set as follows: X has range between 0 to 15

More information

Understanding Your Customer Journey by Extending Adobe Analytics with Big Data

Understanding Your Customer Journey by Extending Adobe Analytics with Big Data SOLUTION BRIEF Understanding Your Customer Journey by Extending Adobe Analytics with Big Data Business Challenge Today s digital marketing teams are overwhelmed by the volume and variety of customer interaction

More information

A Basic Guide to Modeling Techniques for All Direct Marketing Challenges

A Basic Guide to Modeling Techniques for All Direct Marketing Challenges A Basic Guide to Modeling Techniques for All Direct Marketing Challenges Allison Cornia Database Marketing Manager Microsoft Corporation C. Olivia Rud Executive Vice President Data Square, LLC Overview

More information

Data Mining: An Overview of Methods and Technologies for Increasing Profits in Direct Marketing. C. Olivia Rud, VP, Fleet Bank

Data Mining: An Overview of Methods and Technologies for Increasing Profits in Direct Marketing. C. Olivia Rud, VP, Fleet Bank Data Mining: An Overview of Methods and Technologies for Increasing Profits in Direct Marketing C. Olivia Rud, VP, Fleet Bank ABSTRACT Data Mining is a new term for the common practice of searching through

More information

Case Study: Direct Mail Subscriptions Tests

Case Study: Direct Mail Subscriptions Tests Case Study: Direct Mail Subscriptions Tests Fast, low-cost creative and price tests leading to a 70% jump in response Introduction Magazine publishers depend on direct mail to grow their subscriber base.

More information

Data Mining for Fun and Profit

Data Mining for Fun and Profit Data Mining for Fun and Profit Data mining is the extraction of implicit, previously unknown, and potentially useful information from data. - Ian H. Witten, Data Mining: Practical Machine Learning Tools

More information

Business Analytics using Data Mining Project Report. Optimizing Operation Room Utilization by Predicting Surgery Duration

Business Analytics using Data Mining Project Report. Optimizing Operation Room Utilization by Predicting Surgery Duration Business Analytics using Data Mining Project Report Optimizing Operation Room Utilization by Predicting Surgery Duration Project Team 4 102034606 WU, CHOU-CHUN 103078508 CHEN, LI-CHAN 102077503 LI, DAI-SIN

More information

Predicting Student Performance by Using Data Mining Methods for Classification

Predicting Student Performance by Using Data Mining Methods for Classification BULGARIAN ACADEMY OF SCIENCES CYBERNETICS AND INFORMATION TECHNOLOGIES Volume 13, No 1 Sofia 2013 Print ISSN: 1311-9702; Online ISSN: 1314-4081 DOI: 10.2478/cait-2013-0006 Predicting Student Performance

More information

Data Mining for Direct Marketing: Problems and

Data Mining for Direct Marketing: Problems and Data Mining for Direct Marketing: Problems and Solutions Charles X. Ling and Chenghui Li Department of Computer Science The University of Western Ontario London, Ontario, Canada N6A 5B7 Tel: 519-661-3341;

More information

Enhancing Quality of Data using Data Mining Method

Enhancing Quality of Data using Data Mining Method JOURNAL OF COMPUTING, VOLUME 2, ISSUE 9, SEPTEMBER 2, ISSN 25-967 WWW.JOURNALOFCOMPUTING.ORG 9 Enhancing Quality of Data using Data Mining Method Fatemeh Ghorbanpour A., Mir M. Pedram, Kambiz Badie, Mohammad

More information

Performance Measures for Machine Learning

Performance Measures for Machine Learning Performance Measures for Machine Learning 1 Performance Measures Accuracy Weighted (Cost-Sensitive) Accuracy Lift Precision/Recall F Break Even Point ROC ROC Area 2 Accuracy Target: 0/1, -1/+1, True/False,

More information

PREDICTIVE DATA MINING ON WEB-BASED E-COMMERCE STORE

PREDICTIVE DATA MINING ON WEB-BASED E-COMMERCE STORE PREDICTIVE DATA MINING ON WEB-BASED E-COMMERCE STORE Jidi Zhao, Tianjin University of Commerce, zhaojidi@263.net Huizhang Shen, Tianjin University of Commerce, hzshen@public.tpt.edu.cn Duo Liu, Tianjin

More information

Modeling Lifetime Value in the Insurance Industry

Modeling Lifetime Value in the Insurance Industry Modeling Lifetime Value in the Insurance Industry C. Olivia Parr Rud, Executive Vice President, Data Square, LLC ABSTRACT Acquisition modeling for direct mail insurance has the unique challenge of targeting

More information

STATISTICA. Clustering Techniques. Case Study: Defining Clusters of Shopping Center Patrons. and

STATISTICA. Clustering Techniques. Case Study: Defining Clusters of Shopping Center Patrons. and Clustering Techniques and STATISTICA Case Study: Defining Clusters of Shopping Center Patrons STATISTICA Solutions for Business Intelligence, Data Mining, Quality Control, and Web-based Analytics Table

More information

BIDM Project. Predicting the contract type for IT/ITES outsourcing contracts

BIDM Project. Predicting the contract type for IT/ITES outsourcing contracts BIDM Project Predicting the contract type for IT/ITES outsourcing contracts N a n d i n i G o v i n d a r a j a n ( 6 1 2 1 0 5 5 6 ) The authors believe that data modelling can be used to predict if an

More information

Working with telecommunications

Working with telecommunications Working with telecommunications Minimizing churn in the telecommunications industry Contents: 1 Churn analysis using data mining 2 Customer churn analysis with IBM SPSS Modeler 3 Types of analysis 3 Feature

More information

Numerical Algorithms Group

Numerical Algorithms Group Title: Summary: Using the Component Approach to Craft Customized Data Mining Solutions One definition of data mining is the non-trivial extraction of implicit, previously unknown and potentially useful

More information

The Scientific Data Mining Process

The Scientific Data Mining Process Chapter 4 The Scientific Data Mining Process When I use a word, Humpty Dumpty said, in rather a scornful tone, it means just what I choose it to mean neither more nor less. Lewis Carroll [87, p. 214] In

More information

Knowledge Discovery and Data Mining

Knowledge Discovery and Data Mining Knowledge Discovery and Data Mining Unit # 10 Sajjad Haider Fall 2012 1 Supervised Learning Process Data Collection/Preparation Data Cleaning Discretization Supervised/Unuspervised Identification of right

More information

Maximizing Return and Minimizing Cost with the Decision Management Systems

Maximizing Return and Minimizing Cost with the Decision Management Systems KDD 2012: Beijing 18 th ACM SIGKDD Conference on Knowledge Discovery and Data Mining Rich Holada, Vice President, IBM SPSS Predictive Analytics Maximizing Return and Minimizing Cost with the Decision Management

More information

Understanding Characteristics of Caravan Insurance Policy Buyer

Understanding Characteristics of Caravan Insurance Policy Buyer Understanding Characteristics of Caravan Insurance Policy Buyer May 10, 2007 Group 5 Chih Hau Huang Masami Mabuchi Muthita Songchitruksa Nopakoon Visitrattakul Executive Summary This report is intended

More information

Scoring the Data Using Association Rules

Scoring the Data Using Association Rules Scoring the Data Using Association Rules Bing Liu, Yiming Ma, and Ching Kian Wong School of Computing National University of Singapore 3 Science Drive 2, Singapore 117543 {liub, maym, wongck}@comp.nus.edu.sg

More information