Data Mining Application in Direct Marketing: Identifying Hot Prospects for Banking Product
|
|
- Alvin Harrison
- 8 years ago
- Views:
Transcription
1 Data Mining Application in Direct Marketing: Identifying Hot Prospects for Banking Product Sagarika Prusty Web Data Mining (ECT 584),Spring 2013 DePaul University,Chicago Keywords: Data Mining, Direct Marketing,Clustering, Naïve Bayes, Decision Tree, Unbalanced data Abstract: Direct Marketing is a process of advertising in which businesses send out promotional offers directly addressed to a customer. Success of this type of campaign is measured as a percentage of customers who positively respond to the campaign. Direct Marketing is increasingly used in Banks, Insurance companies and retail industry. Success rates of these campaigns are normally less than 10%. Data mining can help industries improve the success rate significantly by identifying customers who are most likely to buy the products. Companies can then target their campaigns towards those hot prospects alone. This will lead to a significant reduction in marketing cost and increase the RoI (Return on Investment). In this paper we will apply some of the data mining techniques to a banking dataset and illustrate how data mining can help the bank improve its direct marketing effort. 1. Introduction: Direct marketing is practiced by businesses of all sizes from the smallest start-up to the leaders on the Fortune 500. A well-executed direct advertising campaign can prove a positive return on investment by showing how many potential customers responded to a clear call-toaction. Direct marketing is attractive to many marketers because its positive results can be measured directly. For example, if a marketer sends out 1,000 solicitations by mail and 100 respond to the promotion, the marketer can say with confidence that campaign led directly to Sagarika Prusty Page 1
2 10% direct responses. This metric is known as the 'response rate and it is one of many clearly quantifiable success metricsemployed by direct marketers. In contrast, general advertising uses indirect measurements, such as awareness or engagement, since there is no direct response from a consumer. Measurement of results is a fundamental element in successful direct marketing. Predictive modeling and other data mining techniques can help marketers improve the response rate significantly. For example suppose a company has a marketing budget of sending promotional offers to 1000 prospective customers. The company can get much higher return on Investment by sending promotional offers only to the top 1000 customers who are more likely to buy the product than selecting a random base of 1000 customers. Data mining techniques can help marketers identify those top 1000 hot prospects. Data mining tools like cluster analysis can also help marketers group their customers into different clusters or segments and then address their needs accordingly. 2. KDD Process in Data mining: KDD stands for Knowledge Discovery in Databases and refers to the broad process of discovering useful information from datasets. KDD process is often used interchangeably with data mining but actually data mining is a part of the KDD process. It is a systematic approach and can broadly be divided into 5-6 steps. The process starts with the understanding of the business objective and goals of the project. Then comes the dataset identification process where you select the target that needs to be analyzed. More than often, the dataset available is in raw format and needs to be pre-processed or cleaned. This step is called data preprocessing. Data preprocessing takes maximum amount of time in the entire KDD process and if done properly will make the rest of the steps easier. Sometimes data also needs to be transformed before a particular data mining technique can be applied transformation would typically involve either discretizing the numeric attributes, recoding some of the attributes or oversampling or reducing sample size of the data. Once the data is preprocessed and transformed, it is ready for data modeling. Depending on the need of the problem, few techniques or algorithms that are shortlisted and then applied to the dataset. This step is the data modeling step. In this step data miners will apply different techniques and look for patterns and useful information from the data. Sometimes more than one technique is applied and then the best approach is selected. The best one can either be Sagarika Prusty Page 2
3 one of the established techniques or a hybrid approach. The one approach which gives the best result is then selected as the final model. The final step is the interpretation step in which the information discovered in the data mining step is presented in a format that can be understood by the end user. In this paper we will follow the KDD process. But the data used for this paper is already preprocessed and cleaned. Hence minimum effort is required in preprocessing step. The data can be straight way used for data modeling. The KDD process can be represented throug this simple diagram. Fig. 1(Overview of Web Mining and E-Commerce Data Analytics by BamshadMobasher,DePaul University) 3. Bank Direct Marketing Data 3.1 Source of the data The dataset used for this paper is from a direct marketing campaign of a Portuguese bank for one of its term deposit products. The primary means of campaign was through phone calls to its existing customers. Often, more than one phone call was required to asses if customer is subscribing for the product or not. Data is available in public domain and can be downloaded from The full dataset was described and analyzed in:s. Moro, Sagarika Prusty Page 3
4 R. Laureano and P. Cortez. Using Data Mining for Bank Direct Marketing: An Application of the CRISP-DM Methodology. In P. Novais et al. (Eds.), Proceedings of the European Simulation and Modelling Conference - ESM'2011, pp , Guimarães, Portugal, October, EUROSIS. 3.2 Understanding the dataset The dataset available has instances. For the purpose of my project, I randomly split the data in to two parts. First dataset was the bigger one and contained instances. This dataset has been for training the model. The second dataset with much fewer instances (4524) has been used as the test data set for validating the model performance. 3.3 Attribute Information The dataset used is related to 17 campaigns that occurred between May 2008 and November During these phone campaigns, customers of the banks were offered a long-term deposit application, with an attractive interest rate. For each contact, a large no. of attributes was recorded and the output variable was whether the customer accepted the offer or not (Yes indicating that customer accepted the offer and No indicating a negative response). Demographic details of each customer were then added to the campaign related data. The dataset which was available has already been preprocessed. Rows and columns having missing values are already cleaned, only significant attributes are present (totaling to 17 including the output variable. List of the attributes is as follows: Attribute name Description Value age Age(N) Numeric Technician,Management,Student,Maid,Retired job Job (C) etc. marital Marital Status(C) Single,Married education Education ( C ) Primary,Secondray,Tertiary default Credit in default? (B) Yes or No balance Average Yearly balance(euros) (N) Numeric housing Has housing loan? (B) Yes or No Sagarika Prusty Page 4
5 loan Has personal loan? (B) Yes or No contact Contact communication type ( C ) Phone,Mobile,Unknown day Last contact day of the month (N ) 1,2,3,4,.,29,30,31 month Last contact month of the year(n ) Jan,Feb,Mar,.Nov,Dec duration last contact duration, in seconds (N) Numeric campaign Number of contacts during the campaign(n) Numeric pdays No. of days passed since last contact(n) Numeric poutcome Outcome of the previous marketing campaign (B) Yes or No N=Numeric C=Categorical B=Binary The output variable, as mentioned above is whether the customer subscribed for the term deposit product or not (Yes or No). Of the instances in the training dataset only 4763 instances has the output variable(y) as Yes. This translates to 10.5 % response rate. This data set is an unbalanced dataset since the variable that needs to be classified has a very unbalanced distribution. (89.5% No cases and 10.5% yes cases). This is an important factor that needs to be considered while performing data modeling. 4 Experimental Environment The software used for data modeling in this project is Weka. Weka is am open source data mining tool and can be downloaded for free. 5 Data Modeling The idea is to experiment with different Machine learning algorithms and select the one giving the best results. 5.1 Model Evaluation criteria For model evaluation, I will be using three different approaches(all complementing each other). First one is F measure which is a weighted average of precision and recall and is represented by the formula 1 below. An F measure close to 1 indicates a good model. Sagarika Prusty Page 5
6 F Measure Formula 1 Where TP True Positive (Actually yes and classified as yes ) We want a high TP rate FP False Positive (Actually no but classified as yes ) FN False Negative (Actually yes but classified as no ) Lift Chart Second measure for evaluating model performance is to create a cumulative lift chart and check for the lift value and a sampling level. Lift is represented by the formula 2 below. Lift chart is created by calculating the cumulative positive response (actual response) for the hot prospects (data sorted based on the predicted response-all yes cases on top). Lift= Response rate based on Predicted model/response rate based on random calling - Formula 2 A higher lift value will indicate a better performance. Validation using test data set Third one is how good the model performed with the test data set. This can be measured by measuring the TPR rate of class yes in the test set i.e how many of the actual yes cases were classified as yes by the classifier. The higher the better. I experimented with four models. The results are described below: 5.2 Models Naïve Bayes: I started with Naïve Bayes model which is an old method of classification and predictor selection and is known for its simplicity and stability. I performed a 10 fold cross validation on the instances available in the training data set. The model had a decent accuracy level of 88% but Sagarika Prusty Page 6
7 had a True Positive Rate(TPR) of only 0.53 (F measure of 0.508) for yes cases. This was not a very good result. But I went ahead to check how the model performs with the test dataset (containing 4524 cases). It failed miserably. It classified all the test cases as No. This is because the data set is highly unbalanced and has only 10% of yes cases. (Resolving the case of unbalanced data) To resolve the issue of unbalanced dataset, I reduced the sample size of No cases in the training dataset. I randomly selected 4763 No cases and deleted rest of the No cases from the training dataset. Now, my training dataset had equal number of Yes and No cases and is perfectly balanced. Naïve Bayes with modified training set After modifying the training dataset, I once again ran the Naïve Bayes algorithm and this time result was much better. The summary of Stratified Cross- validation is shown in fig.2 below. Even though the overall accuracy of the model reduced to 78%, TPR and F measure for Yes cases improved significantly. When validated on the test cases, the model was able to classify 419 of the 526 yes cases which translate to a TPR of 80%. Fig. 2 Sagarika Prusty Page 7
8 Decision Tree (C4.5 algorithm) Next, I wanted to check how decision tree handles the unbalanced data. So, I ran the J438 method (equivalent to C4.5 algorithm) in Weka. The summary of the model is shown below in fig3. Overall accuracy of the model is 94% with a decent TPR of and F measure of ROC is 0.94 When tested on the test data set, the model performed much better than the Naïve Bayes model and was able to classify 252 of the total 526 yes cases ( 48%). But, the result was not as satisfactory as Naïve Bayes on the modified training set. Decision Tree (C4.5 algorithm) with modified training set Since decision tree gave better result than Naïve Bayes for the original training set, as a final trial, I experimented with the modified training dataset and applied C4.5 algorithm to it. The summary result is as shown in fig.3 below. Of the four models I experimented with, this one gave the best result. Overall accuracy stands at 90%. Recall and Precision for yes cases are the best of the four models. And F measure for yes class is 0.9. When validated with the test data set, it was able to classify 464 out of 526 yes cases (which is 88%) Fig. 3 Sagarika Prusty Page 8
9 5.3 Model comparison Since the Naïve Bayes with the unbalanced training data set didn t produce desired result, that model has been left out from the comparison chart. Rest three models are compared below: Precision, Recall, F Measure,ROC Area, Performance with test data Model / Parameter Naïve Bayes(Modified Training set) Decision Tree (C4.5) Overall Accuracy 78.15% 93.7% 89.9% Recall (Class Yes) Precision ( Class Yes) F Measure ( Class YeS) ROC Area TPR of test set 80% 48% 88% Decision Tree(Modified training set) Except for Overall Accuracy rate (which is anyways not a very good indicator of model performance), Decision tree with the balanced training dataset has outperformed the other two models. Cumulative Lift Chart Fig.4 below shows the cumulative lift chart for all the three models. Decision tree with the modified training set gives the best result throughout ( except at the lower left level sample less than 379 where the unbalanced dataset gave better response rate). Calculating the lift value when only 30% of customers are to be called: 30% of the total instances in test set= 0.3*4524=1357 Lift = Response rate (using top 30% customers)/response rate (random calling) Random Calling Naïve Bayes(Modified dataset) Decision Tree Decision Tree(Modified dataset) No of yes cases Response Rate 10.6% 30.4% 22.4% 34.4% Lift Sagarika Prusty Page 9
10 Fig.4 6 Clustering Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called cluster) are more similar (in some sense or another) to each other than to those in other groups (clusters). Clustering can be informative and can be used very effectively in Direct Marketing. It can identify the characteristics of customers who are more likely to subscribe or buy a product and companies can leverage this information to customize products for different customer segments. 6.1 K means Clustering: Since, the balanced dataset gave better results for the classification cation exercise, I used the same dataset for clustering as well. I applied K means clustering technique in Weka and experimented with different no. of clusters and got some good results with 8 clusters. Result is shown below in fig 5. Sagarika Prusty Page 10
11 Fig Interpretation of Clustering Analysis Clusters with y as yes represent the groups of customers who subscribed for the term deposit product and those with y as no are the ones who didn t subscribe for the product. From the result above, we can see that cluster 0, cluster 2, cluster 3 and cluster 4 represent customer who subscribed for the term deposit product offered by the bank. If we read the some of the characteristics of customers in cluster 4, they are the ones with management jobs, mean age of 41.7 yrs, are married, have tertiary education, have an average balance of 1952 Euros and have subscribed for the term deposit product. Cluster 3 and Cluster 5 looks very similar representing customer with admin jobs, married, have secondary education. But cluster 3 has customer who subscribed for term deposit and cluster 5 represent customers who didn t subscribe for the product. If we look closely, we see that in cluster 3, the average duration of the campaign call is 435 secs as against 295 secs in cluster 5. This indicates that may be customers with higher call duration enquire about the product in detail as they are interested in the product. Similarly pdays(number of days that passed by after the client was last contacted from a previous campaign)for customers in cluster 3 is 47.6 as against 12.7 in cluster. Low average pdays is possibly because customers who were not contacted before have a value of -1 in the data. Similarly, other clusters can be analyzed for specific characteristics. Sagarika Prusty Page 11
12 7 Conclusion Using the banking dataset we clearly saw the usage of data mining tools in Direct Marketing and how data mining techniques can help companies get better Return on Investment of their marketing budget. Algorithms like Naïve Bayes and Decision trees can help classify customers as good or bad customers (based on their propensity to buy a product) and clustering techniques can help in segmenting customers and identifying characteristics or attributes of good customers. There are lots of other techniques like association rule and other classifier algorithms that have not been discussed in this paper. Depending on the business objective appropriate tool needs to be selected and different models should be tried and tested to come out with the best performing model. That model can then be applied to the direct marketing data for best results. 8 Citation [pdf] [bib] Moro et al., 2011] S. Moro, R. Laureano and P. Cortez.Using Data Mining for Bank Direct Marketing: An Application of the CRISP-DM Methodology. In P. Novais et al. (Eds.), Proceedings of the European Simulation and Modelling Conference - ESM'2011, pp , Guimarães, Portugal, October, EUROSIS. Overview of Web Mining and E-Commerce Data Analytics by BamshadMobasher Sagarika Prusty Page 12
Role of Customer Response Models in Customer Solicitation Center s Direct Marketing Campaign
Role of Customer Response Models in Customer Solicitation Center s Direct Marketing Campaign Arun K Mandapaka, Amit Singh Kushwah, Dr.Goutam Chakraborty Oklahoma State University, OK, USA ABSTRACT Direct
More informationUNDERSTANDING THE EFFECTIVENESS OF BANK DIRECT MARKETING Tarun Gupta, Tong Xia and Diana Lee
UNDERSTANDING THE EFFECTIVENESS OF BANK DIRECT MARKETING Tarun Gupta, Tong Xia and Diana Lee 1. Introduction There are two main approaches for companies to promote their products / services: through mass
More informationApplied Data Mining Analysis: A Step-by-Step Introduction Using Real-World Data Sets
Applied Data Mining Analysis: A Step-by-Step Introduction Using Real-World Data Sets http://info.salford-systems.com/jsm-2015-ctw August 2015 Salford Systems Course Outline Demonstration of two classification
More informationMAXIMIZING RETURN ON DIRECT MARKETING CAMPAIGNS
MAXIMIZING RETURN ON DIRET MARKETING AMPAIGNS IN OMMERIAL BANKING S 229 Project: Final Report Oleksandra Onosova INTRODUTION Recent innovations in cloud computing and unified communications have made a
More informationnot possible or was possible at a high cost for collecting the data.
Data Mining and Knowledge Discovery Generating knowledge from data Knowledge Discovery Data Mining White Paper Organizations collect a vast amount of data in the process of carrying out their day-to-day
More informationEvaluation and Comparison of Data Mining Techniques Over Bank Direct Marketing
Evaluation and Comparison of Data Mining Techniques Over Bank Direct Marketing Niharika Sharma 1, Arvinder Kaur 2, Sheetal Gandotra 3, Dr Bhawna Sharma 4 B.E. Final Year Student, Department of Computer
More informationChapter 6. The stacking ensemble approach
82 This chapter proposes the stacking ensemble approach for combining different data mining classifiers to get better performance. Other combination techniques like voting, bagging etc are also described
More informationData Mining Algorithms Part 1. Dejan Sarka
Data Mining Algorithms Part 1 Dejan Sarka Join the conversation on Twitter: @DevWeek #DW2015 Instructor Bio Dejan Sarka (dsarka@solidq.com) 30 years of experience SQL Server MVP, MCT, 13 books 7+ courses
More informationDecision Support System For A Customer Relationship Management Case Study
61 Decision Support System For A Customer Relationship Management Case Study Ozge Kart 1, Alp Kut 1, and Vladimir Radevski 2 1 Dokuz Eylul University, Izmir, Turkey {ozge, alp}@cs.deu.edu.tr 2 SEE University,
More informationUSING DATA MINING FOR BANK DIRECT MARKETING: AN APPLICATION OF THE CRISP-DM METHODOLOGY
USING DATA MINING FOR BANK DIRECT MARKETING: AN APPLICATION OF THE CRISP-DM METHODOLOGY Sérgio Moro and Raul M. S. Laureano Instituto Universitário de Lisboa (ISCTE IUL) Av.ª das Forças Armadas 1649-026
More informationBenchmarking of different classes of models used for credit scoring
Benchmarking of different classes of models used for credit scoring We use this competition as an opportunity to compare the performance of different classes of predictive models. In particular we want
More informationData Mining Framework for Direct Marketing: A Case Study of Bank Marketing
www.ijcsi.org 198 Data Mining Framework for Direct Marketing: A Case Study of Bank Marketing Lilian Sing oei 1 and Jiayang Wang 2 1 School of Information Science and Engineering, Central South University
More informationClassification of Titanic Passenger Data and Chances of Surviving the Disaster Data Mining with Weka and Kaggle Competition Data
Proceedings of Student-Faculty Research Day, CSIS, Pace University, May 2 nd, 2014 Classification of Titanic Passenger Data and Chances of Surviving the Disaster Data Mining with Weka and Kaggle Competition
More informationIn this presentation, you will be introduced to data mining and the relationship with meaningful use.
In this presentation, you will be introduced to data mining and the relationship with meaningful use. Data mining refers to the art and science of intelligent data analysis. It is the application of machine
More informationPerformance Analysis of Naive Bayes and J48 Classification Algorithm for Data Classification
Performance Analysis of Naive Bayes and J48 Classification Algorithm for Data Classification Tina R. Patil, Mrs. S. S. Sherekar Sant Gadgebaba Amravati University, Amravati tnpatil2@gmail.com, ss_sherekar@rediffmail.com
More informationSTATISTICA. Financial Institutions. Case Study: Credit Scoring. and
Financial Institutions and STATISTICA Case Study: Credit Scoring STATISTICA Solutions for Business Intelligence, Data Mining, Quality Control, and Web-based Analytics Table of Contents INTRODUCTION: WHAT
More informationIntroduction to Data Mining
Introduction to Data Mining Jay Urbain Credits: Nazli Goharian & David Grossman @ IIT Outline Introduction Data Pre-processing Data Mining Algorithms Naïve Bayes Decision Tree Neural Network Association
More informationEvaluation & Validation: Credibility: Evaluating what has been learned
Evaluation & Validation: Credibility: Evaluating what has been learned How predictive is a learned model? How can we evaluate a model Test the model Statistical tests Considerations in evaluating a Model
More informationIBM SPSS Direct Marketing 22
IBM SPSS Direct Marketing 22 Note Before using this information and the product it supports, read the information in Notices on page 25. Product Information This edition applies to version 22, release
More informationfrom Larson Text By Susan Miertschin
Decision Tree Data Mining Example from Larson Text By Susan Miertschin 1 Problem The Maximum Miniatures Marketing Department wants to do a targeted mailing gpromoting the Mythic World line of figurines.
More informationData Mining - The Next Mining Boom?
Howard Ong Principal Consultant Aurora Consulting Pty Ltd Abstract This paper introduces Data Mining to its audience by explaining Data Mining in the context of Corporate and Business Intelligence Reporting.
More informationAPPLICATION OF DATA MINING TECHNIQUES FOR DIRECT MARKETING. Anatoli Nachev
86 ITHEA APPLICATION OF DATA MINING TECHNIQUES FOR DIRECT MARKETING Anatoli Nachev Abstract: This paper presents a case study of data mining modeling techniques for direct marketing. It focuses to three
More informationIncreasing Marketing ROI with Optimized Prediction
Increasing Marketing ROI with Optimized Prediction Yottamine s Unique and Powerful Solution Smart marketers are using predictive analytics to make the best offer to the best customer for the least cost.
More informationData Mining: Overview. What is Data Mining?
Data Mining: Overview What is Data Mining? Recently * coined term for confluence of ideas from statistics and computer science (machine learning and database methods) applied to large databases in science,
More informationIBM SPSS Direct Marketing 23
IBM SPSS Direct Marketing 23 Note Before using this information and the product it supports, read the information in Notices on page 25. Product Information This edition applies to version 23, release
More informationEasily Identify the Right Customers
PASW Direct Marketing 18 Specifications Easily Identify the Right Customers You want your marketing programs to be as profitable as possible, and gaining insight into the information contained in your
More informationData Mining Applications in Higher Education
Executive report Data Mining Applications in Higher Education Jing Luan, PhD Chief Planning and Research Officer, Cabrillo College Founder, Knowledge Discovery Laboratories Table of contents Introduction..............................................................2
More informationHexaware E-book on Predictive Analytics
Hexaware E-book on Predictive Analytics Business Intelligence & Analytics Actionable Intelligence Enabled Published on : Feb 7, 2012 Hexaware E-book on Predictive Analytics What is Data mining? Data mining,
More informationPerformance Measures in Data Mining
Performance Measures in Data Mining Common Performance Measures used in Data Mining and Machine Learning Approaches L. Richter J.M. Cejuela Department of Computer Science Technische Universität München
More informationKnowledge Discovery and Data Mining
Knowledge Discovery and Data Mining Unit # 11 Sajjad Haider Fall 2013 1 Supervised Learning Process Data Collection/Preparation Data Cleaning Discretization Supervised/Unuspervised Identification of right
More informationComparing the Results of Support Vector Machines with Traditional Data Mining Algorithms
Comparing the Results of Support Vector Machines with Traditional Data Mining Algorithms Scott Pion and Lutz Hamel Abstract This paper presents the results of a series of analyses performed on direct mail
More informationImproving Credit Card Fraud Detection with Calibrated Probabilities
Improving Credit Card Fraud Detection with Calibrated Probabilities Alejandro Correa Bahnsen, Aleksandar Stojanovic, Djamila Aouada and Björn Ottersten Interdisciplinary Centre for Security, Reliability
More informationA Regression Approach for Forecasting Vendor Revenue in Telecommunication Industries
A Regression Approach for Forecasting Vendor Revenue in Telecommunication Industries Aida Mustapha *1, Farhana M. Fadzil #2 * Faculty of Computer Science and Information Technology, Universiti Tun Hussein
More informationUsing reporting and data mining techniques to improve knowledge of subscribers; applications to customer profiling and fraud management
Using reporting and data mining techniques to improve knowledge of subscribers; applications to customer profiling and fraud management Paper Jean-Louis Amat Abstract One of the main issues of operators
More informationPredicting the Risk of Heart Attacks using Neural Network and Decision Tree
Predicting the Risk of Heart Attacks using Neural Network and Decision Tree S.Florence 1, N.G.Bhuvaneswari Amma 2, G.Annapoorani 3, K.Malathi 4 PG Scholar, Indian Institute of Information Technology, Srirangam,
More informationReturn on Investment from Inbound Marketing through Implementing HubSpot Software
Return on Investment from Inbound Marketing through Implementing HubSpot Software January 2010 Prepared By: Melissa DiBella MBA Class of 2010 MIT Sloan School of Management Massachusetts Institute of Technology
More informationInsurance Analytics - analýza dat a prediktivní modelování v pojišťovnictví. Pavel Kříž. Seminář z aktuárských věd MFF 4.
Insurance Analytics - analýza dat a prediktivní modelování v pojišťovnictví Pavel Kříž Seminář z aktuárských věd MFF 4. dubna 2014 Summary 1. Application areas of Insurance Analytics 2. Insurance Analytics
More informationExperiments in Web Page Classification for Semantic Web
Experiments in Web Page Classification for Semantic Web Asad Satti, Nick Cercone, Vlado Kešelj Faculty of Computer Science, Dalhousie University E-mail: {rashid,nick,vlado}@cs.dal.ca Abstract We address
More informationArtificial Neural Network, Decision Tree and Statistical Techniques Applied for Designing and Developing E-mail Classifier
International Journal of Recent Technology and Engineering (IJRTE) ISSN: 2277-3878, Volume-1, Issue-6, January 2013 Artificial Neural Network, Decision Tree and Statistical Techniques Applied for Designing
More informationDATA MINING TECHNIQUES AND APPLICATIONS
DATA MINING TECHNIQUES AND APPLICATIONS Mrs. Bharati M. Ramageri, Lecturer Modern Institute of Information Technology and Research, Department of Computer Application, Yamunanagar, Nigdi Pune, Maharashtra,
More informationEasily Identify Your Best Customers
IBM SPSS Statistics Easily Identify Your Best Customers Use IBM SPSS predictive analytics software to gain insight from your customer database Contents: 1 Introduction 2 Exploring customer data Where do
More informationData mining and statistical models in marketing campaigns of BT Retail
Data mining and statistical models in marketing campaigns of BT Retail Francesco Vivarelli and Martyn Johnson Database Exploitation, Segmentation and Targeting group BT Retail Pp501 Holborn centre 120
More informationWhat is Customer Relationship Management? Customer Relationship Management Analytics. Customer Life Cycle. Objectives of CRM. Three Types of CRM
Relationship Management Analytics What is Relationship Management? CRM is a strategy which utilises a combination of Week 13: Summary information technology policies processes, employees to develop profitable
More informationON INTEGRATING UNSUPERVISED AND SUPERVISED CLASSIFICATION FOR CREDIT RISK EVALUATION
ISSN 9 X INFORMATION TECHNOLOGY AND CONTROL, 00, Vol., No.A ON INTEGRATING UNSUPERVISED AND SUPERVISED CLASSIFICATION FOR CREDIT RISK EVALUATION Danuta Zakrzewska Institute of Computer Science, Technical
More informationData Mining in CRM & Direct Marketing. Jun Du The University of Western Ontario jdu43@uwo.ca
Data Mining in CRM & Direct Marketing Jun Du The University of Western Ontario jdu43@uwo.ca Outline Why CRM & Marketing Goals in CRM & Marketing Models and Methodologies Case Study: Response Model Case
More informationSocial Media Mining. Data Mining Essentials
Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers
More informationKeywords Data mining, Classification Algorithm, Decision tree, J48, Random forest, Random tree, LMT, WEKA 3.7. Fig.1. Data mining techniques.
International Journal of Emerging Research in Management &Technology Research Article October 2015 Comparative Study of Various Decision Tree Classification Algorithm Using WEKA Purva Sewaiwar, Kamal Kant
More informationData Mining Solutions for the Business Environment
Database Systems Journal vol. IV, no. 4/2013 21 Data Mining Solutions for the Business Environment Ruxandra PETRE University of Economic Studies, Bucharest, Romania ruxandra_stefania.petre@yahoo.com Over
More informationOverview. Evaluation Connectionist and Statistical Language Processing. Test and Validation Set. Training and Test Set
Overview Evaluation Connectionist and Statistical Language Processing Frank Keller keller@coli.uni-sb.de Computerlinguistik Universität des Saarlandes training set, validation set, test set holdout, stratification
More informationKnowledge Discovery and Data Mining
Knowledge Discovery and Data Mining Lecture 15 - ROC, AUC & Lift Tom Kelsey School of Computer Science University of St Andrews http://tom.home.cs.st-andrews.ac.uk twk@st-andrews.ac.uk Tom Kelsey ID5059-17-AUC
More information1 Choosing the right data mining techniques for the job (8 minutes,
CS490D Spring 2004 Final Solutions, May 3, 2004 Prof. Chris Clifton Time will be tight. If you spend more than the recommended time on any question, go on to the next one. If you can t answer it in the
More informationPentaho Data Mining Last Modified on January 22, 2007
Pentaho Data Mining Copyright 2007 Pentaho Corporation. Redistribution permitted. All trademarks are the property of their respective owners. For the latest information, please visit our web site at www.pentaho.org
More informationDetermining optimum insurance product portfolio through predictive analytics BADM Final Project Report
2012 Determining optimum insurance product portfolio through predictive analytics BADM Final Project Report Dinesh Ganti(61310071), Gauri Singh(61310560), Ravi Shankar(61310210), Shouri Kamtala(61310215),
More informationDATA MINING TOOL FOR INTEGRATED COMPLAINT MANAGEMENT SYSTEM WEKA 3.6.7
DATA MINING TOOL FOR INTEGRATED COMPLAINT MANAGEMENT SYSTEM WEKA 3.6.7 UNDER THE GUIDANCE Dr. N.P. DHAVALE, DGM, INFINET Department SUBMITTED TO INSTITUTE FOR DEVELOPMENT AND RESEARCH IN BANKING TECHNOLOGY
More informationDirect Marketing When There Are Voluntary Buyers
Direct Marketing When There Are Voluntary Buyers Yi-Ting Lai and Ke Wang Simon Fraser University {llai2, wangk}@cs.sfu.ca Daymond Ling, Hua Shi, and Jason Zhang Canadian Imperial Bank of Commerce {Daymond.Ling,
More informationData Mining Practical Machine Learning Tools and Techniques
Credibility: Evaluating what s been learned Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 5 of Data Mining by I. H. Witten, E. Frank and M. A. Hall Issues: training, testing,
More informationActive Learning SVM for Blogs recommendation
Active Learning SVM for Blogs recommendation Xin Guan Computer Science, George Mason University Ⅰ.Introduction In the DH Now website, they try to review a big amount of blogs and articles and find the
More informationTargeted Marketing, KDD Cup and Customer Modeling
Targeted Marketing, KDD Cup and Customer Modeling Outline Direct Marketing Review: Evaluation: Lift, Gains KDD Cup 1997 Lift and Benefit estimation Privacy and Data Mining 2 Direct Marketing Paradigm Find
More informationMining the Software Change Repository of a Legacy Telephony System
Mining the Software Change Repository of a Legacy Telephony System Jelber Sayyad Shirabad, Timothy C. Lethbridge, Stan Matwin School of Information Technology and Engineering University of Ottawa, Ottawa,
More informationIBMMS DECISION SUPPORT TOOL FOR MANAGEMENT OF BANK TELEMARKETING CAMPAIGNS
IBMMS DECISION SUPPORT TOOL FOR MANAGEMENT OF BANK TELEMARKETING CAMPAIGNS Ali KELES and Ayturk KELES Department of Computer Education and Instructional Technology, Faculty of Education Agri Ibrahim Cecen
More informationPredicting earning potential on Adult Dataset
MSc in Computing, Business Intelligence and Data Mining stream. Business Intelligence and Data Mining Applications Project Report. Predicting earning potential on Adult Dataset Submitted by: xxxxxxx Supervisor:
More informationPractical Data Science with Azure Machine Learning, SQL Data Mining, and R
Practical Data Science with Azure Machine Learning, SQL Data Mining, and R Overview This 4-day class is the first of the two data science courses taught by Rafal Lukawiecki. Some of the topics will be
More informationComparison of Data Mining Techniques used for Financial Data Analysis
Comparison of Data Mining Techniques used for Financial Data Analysis Abhijit A. Sawant 1, P. M. Chawan 2 1 Student, 2 Associate Professor, Department of Computer Technology, VJTI, Mumbai, INDIA Abstract
More informationFRAUD DETECTION IN ELECTRIC POWER DISTRIBUTION NETWORKS USING AN ANN-BASED KNOWLEDGE-DISCOVERY PROCESS
FRAUD DETECTION IN ELECTRIC POWER DISTRIBUTION NETWORKS USING AN ANN-BASED KNOWLEDGE-DISCOVERY PROCESS Breno C. Costa, Bruno. L. A. Alberto, André M. Portela, W. Maduro, Esdras O. Eler PDITec, Belo Horizonte,
More informationExtension of Decision Tree Algorithm for Stream Data Mining Using Real Data
Fifth International Workshop on Computational Intelligence & Applications IEEE SMC Hiroshima Chapter, Hiroshima University, Japan, November 10, 11 & 12, 2009 Extension of Decision Tree Algorithm for Stream
More informationHow To Understand The Impact Of A Computer On Organization
International Journal of Research in Engineering & Technology (IJRET) Vol. 1, Issue 1, June 2013, 1-6 Impact Journals IMPACT OF COMPUTER ON ORGANIZATION A. D. BHOSALE 1 & MARATHE DAGADU MITHARAM 2 1 Department
More informationCSC 177 Fall 2014 Team Project Final Report
CSC 177 Fall 2014 Team Project Final Report Project Title, Data Mining on Farmers Market Data Instructor: Dr. Meiliu Lu Team Members: Yogesh Isawe Kalindi Mehta Aditi Kulkarni CSc 177 DM Project Cover
More informationT-61.3050 : Email Classification as Spam or Ham using Naive Bayes Classifier. Santosh Tirunagari : 245577
T-61.3050 : Email Classification as Spam or Ham using Naive Bayes Classifier Santosh Tirunagari : 245577 January 20, 2011 Abstract This term project gives a solution how to classify an email as spam or
More informationAzure Machine Learning, SQL Data Mining and R
Azure Machine Learning, SQL Data Mining and R Day-by-day Agenda Prerequisites No formal prerequisites. Basic knowledge of SQL Server Data Tools, Excel and any analytical experience helps. Best of all:
More informationIBM SPSS Direct Marketing
IBM Software IBM SPSS Statistics 19 IBM SPSS Direct Marketing Understand your customers and improve marketing campaigns Highlights With IBM SPSS Direct Marketing, you can: Understand your customers in
More informationKnowledge Discovery and Data Mining
Knowledge Discovery and Data Mining Unit # 6 Sajjad Haider Fall 2014 1 Evaluating the Accuracy of a Classifier Holdout, random subsampling, crossvalidation, and the bootstrap are common techniques for
More informationPredicting borrowers chance of defaulting on credit loans
Predicting borrowers chance of defaulting on credit loans Junjie Liang (junjie87@stanford.edu) Abstract Credit score prediction is of great interests to banks as the outcome of the prediction algorithm
More informationGetting Even More Out of Ensemble Selection
Getting Even More Out of Ensemble Selection Quan Sun Department of Computer Science The University of Waikato Hamilton, New Zealand qs12@cs.waikato.ac.nz ABSTRACT Ensemble Selection uses forward stepwise
More informationPredicting Flight Delays
Predicting Flight Delays Dieterich Lawson jdlawson@stanford.edu William Castillo will.castillo@stanford.edu Introduction Every year approximately 20% of airline flights are delayed or cancelled, costing
More informationA Logistic Regression Approach to Ad Click Prediction
A Logistic Regression Approach to Ad Click Prediction Gouthami Kondakindi kondakin@usc.edu Satakshi Rana satakshr@usc.edu Aswin Rajkumar aswinraj@usc.edu Sai Kaushik Ponnekanti ponnekan@usc.edu Vinit Parakh
More informationA Property & Casualty Insurance Predictive Modeling Process in SAS
Paper AA-02-2015 A Property & Casualty Insurance Predictive Modeling Process in SAS 1.0 ABSTRACT Mei Najim, Sedgwick Claim Management Services, Chicago, Illinois Predictive analytics has been developing
More informationData Mining - Evaluation of Classifiers
Data Mining - Evaluation of Classifiers Lecturer: JERZY STEFANOWSKI Institute of Computing Sciences Poznan University of Technology Poznan, Poland Lecture 4 SE Master Course 2008/2009 revised for 2010
More informationMicrosoft Azure Machine learning Algorithms
Microsoft Azure Machine learning Algorithms Tomaž KAŠTRUN @tomaz_tsql Tomaz.kastrun@gmail.com http://tomaztsql.wordpress.com Our Sponsors Speaker info https://tomaztsql.wordpress.com Agenda Focus on explanation
More informationData Mining. Nonlinear Classification
Data Mining Unit # 6 Sajjad Haider Fall 2014 1 Nonlinear Classification Classes may not be separable by a linear boundary Suppose we randomly generate a data set as follows: X has range between 0 to 15
More informationUnderstanding Your Customer Journey by Extending Adobe Analytics with Big Data
SOLUTION BRIEF Understanding Your Customer Journey by Extending Adobe Analytics with Big Data Business Challenge Today s digital marketing teams are overwhelmed by the volume and variety of customer interaction
More informationA Basic Guide to Modeling Techniques for All Direct Marketing Challenges
A Basic Guide to Modeling Techniques for All Direct Marketing Challenges Allison Cornia Database Marketing Manager Microsoft Corporation C. Olivia Rud Executive Vice President Data Square, LLC Overview
More informationData Mining: An Overview of Methods and Technologies for Increasing Profits in Direct Marketing. C. Olivia Rud, VP, Fleet Bank
Data Mining: An Overview of Methods and Technologies for Increasing Profits in Direct Marketing C. Olivia Rud, VP, Fleet Bank ABSTRACT Data Mining is a new term for the common practice of searching through
More informationCase Study: Direct Mail Subscriptions Tests
Case Study: Direct Mail Subscriptions Tests Fast, low-cost creative and price tests leading to a 70% jump in response Introduction Magazine publishers depend on direct mail to grow their subscriber base.
More informationData Mining for Fun and Profit
Data Mining for Fun and Profit Data mining is the extraction of implicit, previously unknown, and potentially useful information from data. - Ian H. Witten, Data Mining: Practical Machine Learning Tools
More informationBusiness Analytics using Data Mining Project Report. Optimizing Operation Room Utilization by Predicting Surgery Duration
Business Analytics using Data Mining Project Report Optimizing Operation Room Utilization by Predicting Surgery Duration Project Team 4 102034606 WU, CHOU-CHUN 103078508 CHEN, LI-CHAN 102077503 LI, DAI-SIN
More informationPredicting Student Performance by Using Data Mining Methods for Classification
BULGARIAN ACADEMY OF SCIENCES CYBERNETICS AND INFORMATION TECHNOLOGIES Volume 13, No 1 Sofia 2013 Print ISSN: 1311-9702; Online ISSN: 1314-4081 DOI: 10.2478/cait-2013-0006 Predicting Student Performance
More informationData Mining for Direct Marketing: Problems and
Data Mining for Direct Marketing: Problems and Solutions Charles X. Ling and Chenghui Li Department of Computer Science The University of Western Ontario London, Ontario, Canada N6A 5B7 Tel: 519-661-3341;
More informationEnhancing Quality of Data using Data Mining Method
JOURNAL OF COMPUTING, VOLUME 2, ISSUE 9, SEPTEMBER 2, ISSN 25-967 WWW.JOURNALOFCOMPUTING.ORG 9 Enhancing Quality of Data using Data Mining Method Fatemeh Ghorbanpour A., Mir M. Pedram, Kambiz Badie, Mohammad
More informationPerformance Measures for Machine Learning
Performance Measures for Machine Learning 1 Performance Measures Accuracy Weighted (Cost-Sensitive) Accuracy Lift Precision/Recall F Break Even Point ROC ROC Area 2 Accuracy Target: 0/1, -1/+1, True/False,
More informationPREDICTIVE DATA MINING ON WEB-BASED E-COMMERCE STORE
PREDICTIVE DATA MINING ON WEB-BASED E-COMMERCE STORE Jidi Zhao, Tianjin University of Commerce, zhaojidi@263.net Huizhang Shen, Tianjin University of Commerce, hzshen@public.tpt.edu.cn Duo Liu, Tianjin
More informationModeling Lifetime Value in the Insurance Industry
Modeling Lifetime Value in the Insurance Industry C. Olivia Parr Rud, Executive Vice President, Data Square, LLC ABSTRACT Acquisition modeling for direct mail insurance has the unique challenge of targeting
More informationSTATISTICA. Clustering Techniques. Case Study: Defining Clusters of Shopping Center Patrons. and
Clustering Techniques and STATISTICA Case Study: Defining Clusters of Shopping Center Patrons STATISTICA Solutions for Business Intelligence, Data Mining, Quality Control, and Web-based Analytics Table
More informationBIDM Project. Predicting the contract type for IT/ITES outsourcing contracts
BIDM Project Predicting the contract type for IT/ITES outsourcing contracts N a n d i n i G o v i n d a r a j a n ( 6 1 2 1 0 5 5 6 ) The authors believe that data modelling can be used to predict if an
More informationWorking with telecommunications
Working with telecommunications Minimizing churn in the telecommunications industry Contents: 1 Churn analysis using data mining 2 Customer churn analysis with IBM SPSS Modeler 3 Types of analysis 3 Feature
More informationNumerical Algorithms Group
Title: Summary: Using the Component Approach to Craft Customized Data Mining Solutions One definition of data mining is the non-trivial extraction of implicit, previously unknown and potentially useful
More informationThe Scientific Data Mining Process
Chapter 4 The Scientific Data Mining Process When I use a word, Humpty Dumpty said, in rather a scornful tone, it means just what I choose it to mean neither more nor less. Lewis Carroll [87, p. 214] In
More informationKnowledge Discovery and Data Mining
Knowledge Discovery and Data Mining Unit # 10 Sajjad Haider Fall 2012 1 Supervised Learning Process Data Collection/Preparation Data Cleaning Discretization Supervised/Unuspervised Identification of right
More informationMaximizing Return and Minimizing Cost with the Decision Management Systems
KDD 2012: Beijing 18 th ACM SIGKDD Conference on Knowledge Discovery and Data Mining Rich Holada, Vice President, IBM SPSS Predictive Analytics Maximizing Return and Minimizing Cost with the Decision Management
More informationUnderstanding Characteristics of Caravan Insurance Policy Buyer
Understanding Characteristics of Caravan Insurance Policy Buyer May 10, 2007 Group 5 Chih Hau Huang Masami Mabuchi Muthita Songchitruksa Nopakoon Visitrattakul Executive Summary This report is intended
More informationScoring the Data Using Association Rules
Scoring the Data Using Association Rules Bing Liu, Yiming Ma, and Ching Kian Wong School of Computing National University of Singapore 3 Science Drive 2, Singapore 117543 {liub, maym, wongck}@comp.nus.edu.sg
More information