USING THE PREDICTIVE ANALYTICS FOR EFFECTIVE CROSS-SELLING
|
|
|
- Christal Davidson
- 10 years ago
- Views:
Transcription
1 USING THE PREDICTIVE ANALYTICS FOR EFFECTIVE CROSS-SELLING Michael Combopiano Northwestern University Sunil Kakade Northwestern University Abstract--The decision tree classification algorithm may be used to determine which companies in a CRM system are likely and which are not likely to accept an offer for a product or service which has not yet been offered to them. This paper outlines the steps taken to run such a project with real CRM data. After many iterations of data preparation, a suitable model was attained. Though the final model did not make use of most of the attributes supplied, it generated very agreeable accuracy and ROC metrics. Furthermore, the resulting decision tree will be quite intuitive to nontechnical audiences, and the business value provided will be well-received. Lastly, the results of the model are logical from a business knowledge standpoint, and as such will not pose a challenge to implement and take action on in our production CRM and sales environment. Keywords: Weka, J48, decision tree, classification algorithm, predictive analytics, CRM I. INTRODUCTION The goal of this project is to examine customers and prospects of the BMO Harris Commercial and Business banks to ascertain which ones are likely to accept offers for cash management services. The Business Bank in general targets loans ( business credit ) to companies with $3MM to $20MM in annual sales, and the Commercial Bank segment is over $20MM. Each of these lines of business has a separate division, Treasury and Payment Solutions, to provide basic non-loan banking services such as business deposit accounts, business savings/money markets, receivables collection, and so forth. The vast majority of new sales opportunities are loans, not cash management services. For various reasons, we have found that many of our business loan customers and prospects have not been exposed to the sales process for our cash management services. We would like to determine which of this population would most likely respond positively to an offer for cash management services. Since we wish to analyze customers and prospects, an ideal data source for this project is our Oracle CRM On Demand system. ( CRM is Customer Relationship Management, which is our system of record for tracking all sales information, or pipeline management, and meaningful points of contact with our customers, prospects and referral sources.) However, this data is not without its challenges. The first and most formidable is that by its nature, pipeline and interaction (activity) data is subjective and in general does not lend itself well to downstream confirmation. Whereas we as system administrators can rather reliably detect and correct firmographic data entry errors (legal entity name, address, etc.), we cannot as readily detect and correct errors in data elements such as when a sales opportunity advanced from one stage to the next, or what was decided in a meeting with a client, or even when that meeting took place. The most reliable control we have over quality, and it is a strong one, is that we measure certain data elements as Key Performance Indicators which directly influence job performance appraisal and compensation. Therefore, since our CRM system provides information on customers and prospects, our two desired populations for study, and since we feel it is reasonably reliable, it makes a good data source for our goal of determining which customers and prospects are most likely to accept offers for cash management services. The actionable output of this project will be a classification value appended to each of our customer and prospect records in CRM who have not yet been offered our cash management services. This value will inform our sales team as to the relative likelihood of that company being receptive to our offer. The specific steps to be executed are as follows:
2 1. Extract records from Entities (companies) and Opportunities (sales pipeline) objects in Oracle CRM On Demand 2. Select data elements to be considered for use in Weka 3. Perform data cleanup and preparation for use in Weka 4. Determine which attributes will deliver the most value 5. Generate training, testing and production data (the latter is made up of the rows for which we desire a predicted value) 6. Build model: Run J48 classification algorithm with varied parameters with training/test data, choose best set of parameters and save model 7. Run model against production data, append predicted values and re-import back into CRM 8. Construct report in CRM to provide prospecting recommendations to sales team II. DATA UNDERSTANDING As mentioned in the introduction, all data used for this project came from our Oracle CRM On Demand system. The following table shows which fields (attributes) were chosen for extract from CRM, along with additional information about each field. CRM Table Field Type Description Entity Name Text Legal company name Entity Type Text Customer, prospect, etc. * Entity ID Text Key, unique ID Annual Sales Currency Company size in sales Last Contact Date Date Last recorded meeting/call Number of Employees Number Employee count Parent Entity Text Name of corporate parent Priority Text Top 10, Top 50, etc. State Text US state of company residence Annual Revenue Tier Text Annual Sales categories Entity SIC # Text OSHA "Standard Industrial Classification" TPS Sales Mgr Text Cash Management salesperson BMO Relationship Role Text Lead Bank, Participant, etc. Lead Bank Text If not BMO, name of competitor bank LOB Text Line of Business LOB Segment Text Division within line of business Entity Primary Text CRM entity record owner, key salesperson Number of Activities Number Count of all activities (meetings/calls) Number of Contacts Number Count of attached contact (person) records Number of Opportunities Number Count of attached opportunities Number of Wins Number Count of closed/won opportunities Opportunity * Opportunity ID Text Key, unique ID Sales Stage Text Pitched, Closed/Won, Closed/Lost, etc. Category Name Text Loans, deposits, cash management, etc. * Entity ID Text Foreign key
3 Sales Stage History Table 1: CRM * Opportunity ID Text Foreign key Sales Stage Text Pitched, Closed/Won, Closed/Lost, etc. # of Days in Stage Number Count of days in current sales stage These three tables were loaded into Microsoft Access, where queries were used to concatenate into one table and augment that table with additional derived information as described in the next section, Data Preparation. III. DATA PREPRATATION The above three tables were combined in Microsoft Access into one table for use in Weka (J48 algorithm). Table 1 describes this file when it was complete in Microsoft Access. The third column provides pseudocode for fields that were derived from other fields. * TPS stands for Treasury and Payment Systems, which is the name for our cash management line of business. The TPS Status field indicates historically whether a prospect/customer has been offered these services, and whether or not the offer was accepted. The last field, Likely to Buy TPS is the class attribute in Weka. When this data set was initially loaded into Weka, there were a number of issues, primarily caused by forbidden characters. Some of the observations and lessons learned include the following: The SIC field required a lot of work. SIC is an industry classification denoting the type of business the entity specializes in. For example, Testa Produce, Inc., a Chicago-area concern, has SIC number 5148 with a description of Fresh fruits and vegetables. (These designations are codified in the USA by OSHA.) Such a data element held great promise as an attribute used to predict likelihood to purchase cash management services. Fortunately, the multitude of possible values can be categorized by their first two digits into eleven values, which was done. These fields were removed as they were either only intended to be used to derive other more useful attributes or were represented by other retained attributes: o o o o Annual Sales (better represented by Annual Revenue Tier since this has far fewer values and is just as useful in terms of describing company size) Days Since Last Contact (better represented by Years Since Last Contact ) Days in Stage (this is the number of days a sales opportunity has been in its current stage, and it was used to calculate TPS Status ; as a standalone field it intuitively is not useful as a predictor in this model) Entity Name and Entity ID were removed as they add no value to the predictive process With the fields chosen, the next step was to divide the data into two sets: a training and testing set, and the set for which target values were desired in the class field Likely to Buy TPS. The training/testing set contains 15,775 rows and the to-be-determined data set has 143,309 rows (there were 159,088 rows in total.) To extract the training and testing set, all rows were pulled where Likely to Buy TPS had a Y or an N value. Having developed the training and testing set, the next step was to determine which attributes would provide the most value in the chosen predictive model. To help narrow the field of candidate attributes Table 3, domain knowledge was combined with experience and knowledge of the J48 algorithm to arrive at the first set of candidate attributes. Table 3 was loaded into Weka for the initial run of the J48 algorithm. IV. DATA MINING ALGORITHM For this project, the J48 classification algorithm in Weka was chosen as it is a perfect fit for determining classification for given instances. In this project, we seek to classify each instance as likely or not likely to accept an offer for cash management (TPS) services. This is a decision tree algorithm, a very powerful
4 predictive tool with the following characteristics to recommend it: It is rather intuitive and readily understandable by non-technical audiences It is adept at working with multiple types of data (numerical, nominal)
5 MS Access Field Type Source or Access Pseudocode Entity Name Entity Type Entity ID Numeric Annual Sales Numeric Days Since Last Contact Numeric Now() - [Entity].[Last Contact Date] Years Since Last Contact Numeric Round([Entity].[Days Since Last Contact] / 365, 1) Number of Employees Numeric Has Parent Entity = "Y" if [Entity].[Parent Entity] Is Not Null Priority State Annual Revenue Tier SIC SIC 2- Digit = Left([Entity].[SIC],2) SIC Category (from SIC table from OSHA website matched to SIC 2- Digit) SIC Description Has TPS Sales Mgr = "Y" if [Entity].[TPS Sales Mgr] Is Not Null BMO Relationship Role Lead Bank Lead Bank Categorized Retained top 20 values, changed all remaining to "Other" LOB- New LOB Segment Entity Primary Number of Activities Numeric Number of Contacts Numeric Number of Opportunities Numeric Number of Wins Numeric Days in Stage Numeric = [Sales Stage History].[# of Days in Stage] = "Success" if [Opportunity].[Category Name] = "Cash Management" and [Opportunity].[Sales Stage] In ("Closed/Won","04 - Engaged","05 - Implementation") = "Fail" if [Opportunity].[Category Name] = "Cash Management" and [Opportunity].[Sales Stage] In ("Closed/Lost","08 - Decline","09 - Inactive","03 - On Hold") TPS Status* = "Fail" if [Opportunity].[Category Name] = "Cash Management" and [Opportunity].[Sales Stage] In ("00 - Long Term Prospect","01 - Identified Needs","02 - Pitched /Proposed") AND [Sales Stage History].[# of Days in Stage] > 365 = "In Progress" if [Opportunity].[Category Name] = "Cash Management" and [Opportunity].[Sales Stage] In ("00 - Long Term Prospect","01 - Identified Needs","02 - Pitched /Proposed") AND [Sales Stage History].[# of Days in Stage] < 180 = "Not Attempted" if [TPS Status] Is Null Has Business Credit = "Y" if [Opportunity].[Category Name] = "Business Credit" Likely to Buy TPS = "Y" if [TPS Status] = "Success" = "N" if [TPS Status] = "Fail" Table 2: MS
6 Field/Attribute Comments and Intuitive Observations Decision TPS Status d to set historical value in class attribute Likely to Buy TPS for training data, therefore redundant as a Remove training attribute Entity ID Not useful as a predictor Remove Number of Wins Intuitively very valuable as an indicator of likelihood to accept offers Entity Type Intuitive that current customers should be more receptive to offers Entity Primary Categorized by LOB Segment will be useful when running the model for each LOB or LOB segment Remove LOB Segment Intuitive that some sales teams are better at cross- sell than others Days Since Last Contact Redundant: Years Since Last Contact is a better choice Remove Number of Activities Same as above Years Since Last Contact Same as above LOB Redundant: Better served by LOB Segment Remove Number of Opportunities Possible indicator of quality and depth of sales relationship State Better represented by "LOB" and "LOB Segment" Remove Number of Contacts Possible indicator of depth of relationship, though "count of activities per contact" would be better Remove Lead Bank Categorized Some competitors should be easier to win cash management business from SIC Category Possible that some industries are more dependent on cash management services Priority Though arbitrary and subjective, high- priority customers and prospects should correlate with higher likelihood of accepting offers BMO Relationship Role Where BMO is lead bank, should increase likelihood of accepting offers Annual Revenue Tier Unknown if this would be a factor in likelihood to accept cash mgt. offers - clustering would be helpful to better understand Has TPS Sales Mgr Probably not useful as an indicator, not reliably provided in CRM Remove Number of Employees Unknown if this would be a factor in likelihood to accept cash mgt. offers - clustering would be helpful to better Remove understand Has Parent Entity Unknown if this would be a factor in likelihood to accept cash mgt. offers - clustering would be helpful to better understand Remove
7 Has Business Credit Table 3: Attributes "Y" indicates customer has a loan, probably not useful as a predictor since other attributes accomplish the same goal ("TPS Status", "Entity Type") and rank far higher Remove It is also adept at accommodating missing values, a key problem in many data sets A standard home personal computer offers outstanding performance for this algorithm (this project ran in 0.2 seconds) Simply stated, a decision tree resembles a flowchart showing a starting point ( root node ), decisions made, internal stages as a result of the decisions ( internal nodes ), and final end points or final decisions ( leaf nodes ). The following graphic is an annotated copy of the actual decision tree, represented graphically, from this project: Decision tree algorithms are able to work with the following data characteristics: Numeric: Numbers, currency, etc. Binary: Yes/No, Buy/Not Buy, Send/Don t send, etc. Dates: (self-explanatory) : Names of categorical values such as Customer, Prospect, etc. Note that the class variable (more on this later) must be nominal, not date nor numeric. Unary: Represents a numerical value. Examples are the use of hash marks or Roman numerals. Null: A very useful function is that most (if not all) decision tree algorithms, including the Weka J48, can accommodate missing values. In order for decision tree results to be understandable, useful, and transportable to real business need, the tree itself must be kept down to a reasonable size. The J48 algorithm will assist, but the human operator must also contribute by ensuring the data is well-suited to this goal. Some of the characteristics to aim toward are as follows: Since Weka uses Java, reserved characters must be removed, such as those that are found on the numeric keys above the letters on a computer keyboard. Aggregation: Fields that have too many values will create too many nodes in a decision tree, and should be collapsed or aggregated if possible. An example in this project is the collapse of SIC (industry descriptions) into eleven categories. Appropriate Dimensionality and Feature Creation: A reader of this paper who is experienced in decision tree execution might have been alarmed by the initial quantity of fields chosen for extraction from the CRM system, and rightly so. These fields were initially extracted as they were thought to have some potential value to the predictive process, but many were used only for derivation of other fields ( SIC was used to derive SIC Category ), and some, upon further reflection, were decided to have too little or no value to the predictive process, and so were removed. As mentioned, the Weka J48 decision tree algorithm also helps ensure a reasonably-sized decision tree. Decision tree models do so by determining the most efficient combination of attributes and where to split them according to their values. The two steps are as follows:
8 1. Select an attribute to represent the root (starting) node, build an outbound path and node for each possible value 2. Continue splitting each node until leaf or end nodes are reached (a leaf node occurs when all values of that attribute are the same within that node) Following the precepts of Hunt s Algorithm, the J48 algorithm performs this task many times over until the optimal model is derived Determination of the optimal model is accomplished by measuring the degree of purity or homogeneity of values contained within a node. A node displays perfect purity or homogeneity if all values are the same. A node does not have to display perfect purity to become a leaf node, and the model will keep trying until the aggregate purity of all leaf nodes is as high as possible. Three methods widely employed by decision tree algorithms to determine the optimal degree of purity of each node are: GINI Index: Lower value is better Degree of Entropy: Lower value is better Information Gain: Higher value is better Please see Appendix A for descriptions and illustrations of these three measures. A visual review of the resulting decision tree is one measure of the degree of success of each iteration of the J48 model. (Multiple iterations are strongly recommended.) Another visual representation of degree of success is called the confusion matrix. The confusion matrix is simply a table showing the counts of true positives, true negatives, false positives and false negatives. Generally speaking, the predicted values are shown on the X axis and the actual values found by the model are arranged on the Y axis. The following is an annotated version of the confusion matrix generated by the Weka J48 model for this project: Actual Y Actual N Predicted Y 9,706 = True Positive 997 = False Positive Predicted N 235 = False Negative 4,837 = True Negative True Negative: Instances where the model predicted a value of N and the actual value was N False Positive: The model predicted a value of Y but the actual value was N False Negative: The model predicted a value of N but the actual value was Y From the confusion matrix, these metrics may be derived and used to ascertain the overall applicability or desirability of the model: Accuracy: Accuracy is expressed as (True Positives + True Negatives) / (SUM (all four values) Precision: Expressed as (True Positives) / (True Positives + False Positives) Recall: Expressed as (True Positives) / (True Positives + False Negatives) These measures are suitable for attributes with even data distribution, but are very misleading when values are heavily skewed toward one value. (The accuracy value will be high, but the model will not have assigned any class values for the under-represented class.) Because of this, a much more meaningful measure for determining success an applicability of a model is the Receiver Operating Characteristic ( ROC ) curve. The ROC curve is a visual representation of the data points in a two-dimensional graph. Each classification is represented with in the curve. True Positive values are plotted on the Y axis and False Positives are shown on the X axis. The ROC value represents the area between the curve and the x axis, or the area under the curve. A higher value is more desirable, and as such, the goal is to choose a point on the curve, nearest the upper left quadrant as possible, that represents the point that provides the highest possible count of true positives with a minimum of false positives, or the highest tolerable count of false positives. (In our sales application, false positives are more acceptable that they would be in a medical study where a false positive might lead to invasive, costly and unnecessary procedures.) These definitions apply to the confusion matrix: True Positive: Instances where the model predicted a value of Y and the actual value was in fact Y
9 Following is the actual ROC curve for this project: We see in this graph that there is a very concentrated cluster directly in the desired area, circled in red in the upper left quadrant. This indicates an overwhelming concentration of the desired outcome of as many true positives as possible with the smallest count of false positives. This too is reflected in the very high value of for the Area Under the Curve. This concept will be revisited in the following section wherein the success and suitability of the model will be examined. V. EXPERIMENTAL RESULTS AND ANALYSIS This section will discuss findings having executed the Weka J48 classification algorithm with the following characteristics: The algorithm used was weka.classifiers.trees.j48 C 0.25 M 2 There were 15,775 instances (rows of data) with 12 (including class attribute) attributes as described previously in the section titled Data Preparation The test mode chosen was 10-fold crossvalidation The model constructed a tree with five leaf (end-point) nodes, with 9 nodes in total. This is a very reasonablysized tree, though it is interesting to note that very few of the attributes appear in the tree. A practical interpretation of this outcome is that the chosen attributes work very well within a predictive model for this set of data, but it would be highly advisable to try different combinations of attributes on smaller subsets of these 15,775 instances. The decision tree is as follows: Weka also provides a text display of the decision tree: # of Wins <= 0: N (4856.0/137.0) # of Wins > 0 # of Wins <= 1 # of Opportunities <= 2: Y (3441.0/149.0) # of Opportunities > 2 # of Opportunities <= 4: Y (523.0/204.0) # of Opportunities > 4: N (222.0/85.0) # of Wins > 1: Y (6733.0/625.0) The left figure within a set of parentheses is the count or weight of instances that would up in that leaf, and the right figure is the count of misclassified instances. If there are digits to the right of the decimal point, this signifies that there were missing data elements. A prose interpretation of this decision tree might be presented thusly: 1. If a company/entity has no closed/won opportunities (they have accepted none of our offers), then for purposes of this sales campaign they should be ignored as they are highly likely to reject our offers for cash management services. This represents approximately 33% of our examined records. 2. However, if a company has 1 or more closed/won opportunities, we should offer cash management services as there is a very good chance of acceptance. In fact, if the count of closed/won opportunities is greater than one, there is a 91% chance of acceptance according to this model. 3. If there is exactly one closed/won opportunity, then we next consider the total number of opportunities pitched to this company. If the
10 count is less than two, then we have a 95% chance of acceptance of a cash management services offer. 4. If there are more than 2 opportunities but less than 4, chances of acceptance are 60% 5. However, if there are more than 4 opportunities, chances of rejection are 62%. Since this is only 1.4% of our examined population, these should be ignored along with the first group. Given the above interpretation, a suitable recommendation would be to include groups 2 and 3 in a premier campaign to offer cash management services. This encompasses 10,174 companies, or 64% of our sample. Group 4, encompassing 523 companies or 3.3% of our sample, should be in a secondary campaign after the premier campaign is complete. But how trustworthy is this decision tree? To start, Weka provides these metrics: Metric Count Ratio Correctly Classified Instances 14, % Incorrectly Classified Instances 1, % Total Number of Instances 15,775 Certainly these top two metrics are very encouraging, showing an accuracy rating of But recalling that accuracy may not be a good measure of success, we should instead consider our ROC value (see the actual ROC graph on page 13). Since this value is 0.928, we can be very certain that this is a successful model for this extract of data. (See Appendix B for more of the text output from Weka.) VI. CONCLUSION This project sought to construct a reliable predictive model to classify whether or not a company in our CRM system would be likely or not likely to accept an offer for cash management services. To accomplish this, a sample set was extracted from CRM where it was determined whether or not a company had already been offered cash management services, and whether or not that company accepted the offer. After several iterations of data preparation, the Weka J48 algorithm eventually produced a model with very respectable metrics; 92% accuracy and a ROC value of Additionally, the decision tree, though it doesn t use many of the metrics provided, is very simple, quite intuitive, passes logical examination by a domain expert, and will be quite practical to implement in a production environment. VII. FUTURE WORK Assuming the findings of this project are agreeable, the immediate next step is to re-run the model for the remaining 143,309 CRM records of companies that have not yet been offered cash management services. Once this is done, the results of the decision tree will be loaded into CRM and a campaign will be initiated to target the appropriate companies as indicated by the results of the model. These targeted companies will be flagged in CRM so that over time sales effectiveness may be compared between the subjects of this project and those that preceded the project. REFERENCE Tan, Pang-Ning, Michael Steinbach, and Vipin Kumar. Introduction to Data Mining. Boston: Pearson Addison Wesley, Provost, Foster and Fawcett, Tom, Data Science for Business: What You Need to Know About Data Mining and Data-Analytic Thinking. Sebastopol, CA: O Reilly Media, Inc., 2013
Data Mining Algorithms Part 1. Dejan Sarka
Data Mining Algorithms Part 1 Dejan Sarka Join the conversation on Twitter: @DevWeek #DW2015 Instructor Bio Dejan Sarka ([email protected]) 30 years of experience SQL Server MVP, MCT, 13 books 7+ courses
not possible or was possible at a high cost for collecting the data.
Data Mining and Knowledge Discovery Generating knowledge from data Knowledge Discovery Data Mining White Paper Organizations collect a vast amount of data in the process of carrying out their day-to-day
Classification of Titanic Passenger Data and Chances of Surviving the Disaster Data Mining with Weka and Kaggle Competition Data
Proceedings of Student-Faculty Research Day, CSIS, Pace University, May 2 nd, 2014 Classification of Titanic Passenger Data and Chances of Surviving the Disaster Data Mining with Weka and Kaggle Competition
Social Media Mining. Data Mining Essentials
Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers
Applying Customer Attitudinal Segmentation to Improve Marketing Campaigns Wenhong Wang, Deluxe Corporation Mark Antiel, Deluxe Corporation
Applying Customer Attitudinal Segmentation to Improve Marketing Campaigns Wenhong Wang, Deluxe Corporation Mark Antiel, Deluxe Corporation ABSTRACT Customer segmentation is fundamental for successful marketing
Data Mining Application in Direct Marketing: Identifying Hot Prospects for Banking Product
Data Mining Application in Direct Marketing: Identifying Hot Prospects for Banking Product Sagarika Prusty Web Data Mining (ECT 584),Spring 2013 DePaul University,Chicago [email protected] Keywords:
Oracle Data Miner (Extension of SQL Developer 4.0)
An Oracle White Paper October 2013 Oracle Data Miner (Extension of SQL Developer 4.0) Generate a PL/SQL script for workflow deployment Denny Wong Oracle Data Mining Technologies 10 Van de Graff Drive Burlington,
BIDM Project. Predicting the contract type for IT/ITES outsourcing contracts
BIDM Project Predicting the contract type for IT/ITES outsourcing contracts N a n d i n i G o v i n d a r a j a n ( 6 1 2 1 0 5 5 6 ) The authors believe that data modelling can be used to predict if an
Data quality in Accounting Information Systems
Data quality in Accounting Information Systems Comparing Several Data Mining Techniques Erjon Zoto Department of Statistics and Applied Informatics Faculty of Economy, University of Tirana Tirana, Albania
Facebook Friend Suggestion Eytan Daniyalzade and Tim Lipus
Facebook Friend Suggestion Eytan Daniyalzade and Tim Lipus 1. Introduction Facebook is a social networking website with an open platform that enables developers to extract and utilize user information
Overview. Evaluation Connectionist and Statistical Language Processing. Test and Validation Set. Training and Test Set
Overview Evaluation Connectionist and Statistical Language Processing Frank Keller [email protected] Computerlinguistik Universität des Saarlandes training set, validation set, test set holdout, stratification
The Science and Art of Market Segmentation Using PROC FASTCLUS Mark E. Thompson, Forefront Economics Inc, Beaverton, Oregon
The Science and Art of Market Segmentation Using PROC FASTCLUS Mark E. Thompson, Forefront Economics Inc, Beaverton, Oregon ABSTRACT Effective business development strategies often begin with market segmentation,
Mining the Software Change Repository of a Legacy Telephony System
Mining the Software Change Repository of a Legacy Telephony System Jelber Sayyad Shirabad, Timothy C. Lethbridge, Stan Matwin School of Information Technology and Engineering University of Ottawa, Ottawa,
Data Mining - The Next Mining Boom?
Howard Ong Principal Consultant Aurora Consulting Pty Ltd Abstract This paper introduces Data Mining to its audience by explaining Data Mining in the context of Corporate and Business Intelligence Reporting.
STATISTICA. Financial Institutions. Case Study: Credit Scoring. and
Financial Institutions and STATISTICA Case Study: Credit Scoring STATISTICA Solutions for Business Intelligence, Data Mining, Quality Control, and Web-based Analytics Table of Contents INTRODUCTION: WHAT
Customer Analytics. Turn Big Data into Big Value
Turn Big Data into Big Value All Your Data Integrated in Just One Place BIRT Analytics lets you capture the value of Big Data that speeds right by most enterprises. It analyzes massive volumes of data
Evaluation & Validation: Credibility: Evaluating what has been learned
Evaluation & Validation: Credibility: Evaluating what has been learned How predictive is a learned model? How can we evaluate a model Test the model Statistical tests Considerations in evaluating a Model
CLINICAL DECISION SUPPORT FOR HEART DISEASE USING PREDICTIVE MODELS
CLINICAL DECISION SUPPORT FOR HEART DISEASE USING PREDICTIVE MODELS Srpriva Sundaraman Northwestern University [email protected] Sunil Kakade Northwestern University [email protected]
Chapter 6. The stacking ensemble approach
82 This chapter proposes the stacking ensemble approach for combining different data mining classifiers to get better performance. Other combination techniques like voting, bagging etc are also described
How To Solve The Kd Cup 2010 Challenge
A Lightweight Solution to the Educational Data Mining Challenge Kun Liu Yan Xing Faculty of Automation Guangdong University of Technology Guangzhou, 510090, China [email protected] [email protected]
Contents WEKA Microsoft SQL Database
WEKA User Manual Contents WEKA Introduction 3 Background information. 3 Installation. 3 Where to get WEKA... 3 Downloading Information... 3 Opening the program.. 4 Chooser Menu. 4-6 Preprocessing... 6-7
Université de Montpellier 2 Hugo Alatrista-Salas : [email protected]
Université de Montpellier 2 Hugo Alatrista-Salas : [email protected] WEKA Gallirallus Zeland) australis : Endemic bird (New Characteristics Waikato university Weka is a collection
Easily Identify the Right Customers
PASW Direct Marketing 18 Specifications Easily Identify the Right Customers You want your marketing programs to be as profitable as possible, and gaining insight into the information contained in your
Applied Data Mining Analysis: A Step-by-Step Introduction Using Real-World Data Sets
Applied Data Mining Analysis: A Step-by-Step Introduction Using Real-World Data Sets http://info.salford-systems.com/jsm-2015-ctw August 2015 Salford Systems Course Outline Demonstration of two classification
Data Mining in CRM & Direct Marketing. Jun Du The University of Western Ontario [email protected]
Data Mining in CRM & Direct Marketing Jun Du The University of Western Ontario [email protected] Outline Why CRM & Marketing Goals in CRM & Marketing Models and Methodologies Case Study: Response Model Case
Performance Analysis of Naive Bayes and J48 Classification Algorithm for Data Classification
Performance Analysis of Naive Bayes and J48 Classification Algorithm for Data Classification Tina R. Patil, Mrs. S. S. Sherekar Sant Gadgebaba Amravati University, Amravati [email protected], [email protected]
Experiments in Web Page Classification for Semantic Web
Experiments in Web Page Classification for Semantic Web Asad Satti, Nick Cercone, Vlado Kešelj Faculty of Computer Science, Dalhousie University E-mail: {rashid,nick,vlado}@cs.dal.ca Abstract We address
Analysis of WEKA Data Mining Algorithm REPTree, Simple Cart and RandomTree for Classification of Indian News
Analysis of WEKA Data Mining Algorithm REPTree, Simple Cart and RandomTree for Classification of Indian News Sushilkumar Kalmegh Associate Professor, Department of Computer Science, Sant Gadge Baba Amravati
STATISTICA. Clustering Techniques. Case Study: Defining Clusters of Shopping Center Patrons. and
Clustering Techniques and STATISTICA Case Study: Defining Clusters of Shopping Center Patrons STATISTICA Solutions for Business Intelligence, Data Mining, Quality Control, and Web-based Analytics Table
DECISION TREE ANALYSIS: PREDICTION OF SERIOUS TRAFFIC OFFENDING
DECISION TREE ANALYSIS: PREDICTION OF SERIOUS TRAFFIC OFFENDING ABSTRACT The objective was to predict whether an offender would commit a traffic offence involving death, using decision tree analysis. Four
Role of Customer Response Models in Customer Solicitation Center s Direct Marketing Campaign
Role of Customer Response Models in Customer Solicitation Center s Direct Marketing Campaign Arun K Mandapaka, Amit Singh Kushwah, Dr.Goutam Chakraborty Oklahoma State University, OK, USA ABSTRACT Direct
Chapter 12 Discovering New Knowledge Data Mining
Chapter 12 Discovering New Knowledge Data Mining Becerra-Fernandez, et al. -- Knowledge Management 1/e -- 2004 Prentice Hall Additional material 2007 Dekai Wu Chapter Objectives Introduce the student to
Gerry Hobbs, Department of Statistics, West Virginia University
Decision Trees as a Predictive Modeling Method Gerry Hobbs, Department of Statistics, West Virginia University Abstract Predictive modeling has become an important area of interest in tasks such as credit
TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM
TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM Thanh-Nghi Do College of Information Technology, Cantho University 1 Ly Tu Trong Street, Ninh Kieu District Cantho City, Vietnam
Foundations of Business Intelligence: Databases and Information Management
Foundations of Business Intelligence: Databases and Information Management Problem: HP s numerous systems unable to deliver the information needed for a complete picture of business operations, lack of
Pentaho Data Mining Last Modified on January 22, 2007
Pentaho Data Mining Copyright 2007 Pentaho Corporation. Redistribution permitted. All trademarks are the property of their respective owners. For the latest information, please visit our web site at www.pentaho.org
1. Classification problems
Neural and Evolutionary Computing. Lab 1: Classification problems Machine Learning test data repository Weka data mining platform Introduction Scilab 1. Classification problems The main aim of a classification
Implementation of Data Mining Techniques to Perform Market Analysis
Implementation of Data Mining Techniques to Perform Market Analysis B.Sabitha 1, N.G.Bhuvaneswari Amma 2, G.Annapoorani 3, P.Balasubramanian 4 PG Scholar, Indian Institute of Information Technology, Srirangam,
Database Marketing, Business Intelligence and Knowledge Discovery
Database Marketing, Business Intelligence and Knowledge Discovery Note: Using material from Tan / Steinbach / Kumar (2005) Introduction to Data Mining,, Addison Wesley; and Cios / Pedrycz / Swiniarski
Decision Trees What Are They?
Decision Trees What Are They? Introduction...1 Using Decision Trees with Other Modeling Approaches...5 Why Are Decision Trees So Useful?...8 Level of Measurement... 11 Introduction Decision trees are a
Behavioral Segmentation
Behavioral Segmentation TM Contents 1. The Importance of Segmentation in Contemporary Marketing... 2 2. Traditional Methods of Segmentation and their Limitations... 2 2.1 Lack of Homogeneity... 3 2.2 Determining
730 Yale Avenue Swarthmore, PA 19081 www.raabassociatesinc.com [email protected]
Lead Scoring: Five Steps to Getting Started 730 Yale Avenue Swarthmore, PA 19081 www.raabassociatesinc.com [email protected] Introduction Lead scoring applies mathematical formulas to rank potential
Strategic Online Advertising: Modeling Internet User Behavior with
2 Strategic Online Advertising: Modeling Internet User Behavior with Patrick Johnston, Nicholas Kristoff, Heather McGinness, Phuong Vu, Nathaniel Wong, Jason Wright with William T. Scherer and Matthew
Data mining techniques: decision trees
Data mining techniques: decision trees 1/39 Agenda Rule systems Building rule systems vs rule systems Quick reference 2/39 1 Agenda Rule systems Building rule systems vs rule systems Quick reference 3/39
USING DATA SCIENCE TO DISCOVE INSIGHT OF MEDICAL PROVIDERS CHARGE FOR COMMON SERVICES
USING DATA SCIENCE TO DISCOVE INSIGHT OF MEDICAL PROVIDERS CHARGE FOR COMMON SERVICES Irron Williams Northwestern University [email protected] Abstract--Data science is evolving. In
Tutorial Segmentation and Classification
MARKETING ENGINEERING FOR EXCEL TUTORIAL VERSION 1.0.8 Tutorial Segmentation and Classification Marketing Engineering for Excel is a Microsoft Excel add-in. The software runs from within Microsoft Excel
How To Understand How Weka Works
More Data Mining with Weka Class 1 Lesson 1 Introduction Ian H. Witten Department of Computer Science University of Waikato New Zealand weka.waikato.ac.nz More Data Mining with Weka a practical course
Data Mining Solutions for the Business Environment
Database Systems Journal vol. IV, no. 4/2013 21 Data Mining Solutions for the Business Environment Ruxandra PETRE University of Economic Studies, Bucharest, Romania [email protected] Over
Active Learning SVM for Blogs recommendation
Active Learning SVM for Blogs recommendation Xin Guan Computer Science, George Mason University Ⅰ.Introduction In the DH Now website, they try to review a big amount of blogs and articles and find the
Beating the MLB Moneyline
Beating the MLB Moneyline Leland Chen [email protected] Andrew He [email protected] 1 Abstract Sports forecasting is a challenging task that has similarities to stock market prediction, requiring time-series
Survey Analysis: Data Mining versus Standard Statistical Analysis for Better Analysis of Survey Responses
Survey Analysis: Data Mining versus Standard Statistical Analysis for Better Analysis of Survey Responses Salford Systems Data Mining 2006 March 27-31 2006 San Diego, CA By Dean Abbott Abbott Analytics
Applying Data Science to Sales Pipelines for Fun and Profit
Applying Data Science to Sales Pipelines for Fun and Profit Andy Twigg, CTO, C9 @lambdatwigg Abstract Machine learning is now routinely applied to many areas of industry. At C9, we apply machine learning
CI6227: Data Mining. Lesson 11b: Ensemble Learning. Data Analytics Department, Institute for Infocomm Research, A*STAR, Singapore.
CI6227: Data Mining Lesson 11b: Ensemble Learning Sinno Jialin PAN Data Analytics Department, Institute for Infocomm Research, A*STAR, Singapore Acknowledgements: slides are adapted from the lecture notes
Guide to Performance and Tuning: Query Performance and Sampled Selectivity
Guide to Performance and Tuning: Query Performance and Sampled Selectivity A feature of Oracle Rdb By Claude Proteau Oracle Rdb Relational Technology Group Oracle Corporation 1 Oracle Rdb Journal Sampled
Database Design Basics
Database Design Basics Table of Contents SOME DATABASE TERMS TO KNOW... 1 WHAT IS GOOD DATABASE DESIGN?... 2 THE DESIGN PROCESS... 2 DETERMINING THE PURPOSE OF YOUR DATABASE... 3 FINDING AND ORGANIZING
COC131 Data Mining - Clustering
COC131 Data Mining - Clustering Martin D. Sykora [email protected] Tutorial 05, Friday 20th March 2009 1. Fire up Weka (Waikako Environment for Knowledge Analysis) software, launch the explorer window
EXPERIMENTAL ERROR AND DATA ANALYSIS
EXPERIMENTAL ERROR AND DATA ANALYSIS 1. INTRODUCTION: Laboratory experiments involve taking measurements of physical quantities. No measurement of any physical quantity is ever perfectly accurate, except
CART 6.0 Feature Matrix
CART 6.0 Feature Matri Enhanced Descriptive Statistics Full summary statistics Brief summary statistics Stratified summary statistics Charts and histograms Improved User Interface New setup activity window
In this presentation, you will be introduced to data mining and the relationship with meaningful use.
In this presentation, you will be introduced to data mining and the relationship with meaningful use. Data mining refers to the art and science of intelligent data analysis. It is the application of machine
Simple Predictive Analytics Curtis Seare
Using Excel to Solve Business Problems: Simple Predictive Analytics Curtis Seare Copyright: Vault Analytics July 2010 Contents Section I: Background Information Why use Predictive Analytics? How to use
release 240 Exact Synergy Enterprise CRM Implementation Manual
release 240 Exact Synergy Enterprise CRM Implementation Manual EXACT SYNERGY ENTERPRISE CRM IMPLEMENTATION MANUAL The information provided in this manual is intended for internal use by or within the organization
Improving the Performance of Data Mining Models with Data Preparation Using SAS Enterprise Miner Ricardo Galante, SAS Institute Brasil, São Paulo, SP
Improving the Performance of Data Mining Models with Data Preparation Using SAS Enterprise Miner Ricardo Galante, SAS Institute Brasil, São Paulo, SP ABSTRACT In data mining modelling, data preparation
Data Mining for Business Analytics
Data Mining for Business Analytics Lecture 2: Introduction to Predictive Modeling Stern School of Business New York University Spring 2014 MegaTelCo: Predicting Customer Churn You just landed a great analytical
Final Project Report
CPSC545 by Introduction to Data Mining Prof. Martin Schultz & Prof. Mark Gerstein Student Name: Yu Kor Hugo Lam Student ID : 904907866 Due Date : May 7, 2007 Introduction Final Project Report Pseudogenes
Performance Metrics for Graph Mining Tasks
Performance Metrics for Graph Mining Tasks 1 Outline Introduction to Performance Metrics Supervised Learning Performance Metrics Unsupervised Learning Performance Metrics Optimizing Metrics Statistical
DECISION TREE INDUCTION FOR FINANCIAL FRAUD DETECTION USING ENSEMBLE LEARNING TECHNIQUES
DECISION TREE INDUCTION FOR FINANCIAL FRAUD DETECTION USING ENSEMBLE LEARNING TECHNIQUES Vijayalakshmi Mahanra Rao 1, Yashwant Prasad Singh 2 Multimedia University, Cyberjaya, MALAYSIA 1 [email protected]
LEADing Practice: Artifact Description: Business, Information & Data Object Modelling. Relating Objects
LEADing Practice: Artifact Description: Business, Information & Data Object Modelling Relating Objects 1 Table of Contents 1.1 The Way of Thinking with Objects... 3 1.2 The Way of Working with Objects...
131-1. Adding New Level in KDD to Make the Web Usage Mining More Efficient. Abstract. 1. Introduction [1]. 1/10
1/10 131-1 Adding New Level in KDD to Make the Web Usage Mining More Efficient Mohammad Ala a AL_Hamami PHD Student, Lecturer m_ah_1@yahoocom Soukaena Hassan Hashem PHD Student, Lecturer soukaena_hassan@yahoocom
Azure Machine Learning, SQL Data Mining and R
Azure Machine Learning, SQL Data Mining and R Day-by-day Agenda Prerequisites No formal prerequisites. Basic knowledge of SQL Server Data Tools, Excel and any analytical experience helps. Best of all:
Practical Data Science with Azure Machine Learning, SQL Data Mining, and R
Practical Data Science with Azure Machine Learning, SQL Data Mining, and R Overview This 4-day class is the first of the two data science courses taught by Rafal Lukawiecki. Some of the topics will be
Business Intelligence. Tutorial for Rapid Miner (Advanced Decision Tree and CRISP-DM Model with an example of Market Segmentation*)
Business Intelligence Professor Chen NAME: Due Date: Tutorial for Rapid Miner (Advanced Decision Tree and CRISP-DM Model with an example of Market Segmentation*) Tutorial Summary Objective: Richard would
Content Marketing Integration Workbook
Content Marketing Integration Workbook 730 Yale Avenue Swarthmore, PA 19081 www.raabassociatesinc.com [email protected] Introduction Like the Molière character who is delighted to learn he has
Data Mining. Nonlinear Classification
Data Mining Unit # 6 Sajjad Haider Fall 2014 1 Nonlinear Classification Classes may not be separable by a linear boundary Suppose we randomly generate a data set as follows: X has range between 0 to 15
Start-up Companies Predictive Models Analysis. Boyan Yankov, Kaloyan Haralampiev, Petko Ruskov
Start-up Companies Predictive Models Analysis Boyan Yankov, Kaloyan Haralampiev, Petko Ruskov Abstract: A quantitative research is performed to derive a model for predicting the success of Bulgarian start-up
Compact Representations and Approximations for Compuation in Games
Compact Representations and Approximations for Compuation in Games Kevin Swersky April 23, 2008 Abstract Compact representations have recently been developed as a way of both encoding the strategic interactions
Chapter 7: Data Mining
Chapter 7: Data Mining Overview Topics discussed: The Need for Data Mining and Business Value The Data Mining Process: Define Business Objectives Get Raw Data Identify Relevant Predictive Variables Gain
Data Mining Techniques Chapter 6: Decision Trees
Data Mining Techniques Chapter 6: Decision Trees What is a classification decision tree?.......................................... 2 Visualizing decision trees...................................................
The Data Mining Process
Sequence for Determining Necessary Data. Wrong: Catalog everything you have, and decide what data is important. Right: Work backward from the solution, define the problem explicitly, and map out the data
Sun Bear Marketing Automation Software
Sun Bear Marketing Automation Software Provide your marketing and sales groups with a single, integrated, web based platform that allows them to easily automate and manage marketing database, campaign,
Mining Airline Data for CRM Strategies
Proceedings of the 7th WSEAS International Conference on Simulation, Modelling and Optimization, Beijing, China, September 15-17, 2007 345 Mining Airline Data for CRM Strategies LENA MAALOUF, NASHAT MANSOUR
Lead Scoring. Five steps to getting started. wowanalytics. 730 Yale Avenue Swarthmore, PA 19081 www.raabassociatesinc.com info@raabassociatesinc.
Lead Scoring Five steps to getting started supported and sponsored by wowanalytics 730 Yale Avenue Swarthmore, PA 19081 www.raabassociatesinc.com [email protected] LEAD SCORING 5 steps to getting
Numerical Algorithms Group
Title: Summary: Using the Component Approach to Craft Customized Data Mining Solutions One definition of data mining is the non-trivial extraction of implicit, previously unknown and potentially useful
Deposit Identification Utility and Visualization Tool
Deposit Identification Utility and Visualization Tool Colorado School of Mines Field Session Summer 2014 David Alexander Jeremy Kerr Luke McPherson Introduction Newmont Mining Corporation was founded in
How To Check For Differences In The One Way Anova
MINITAB ASSISTANT WHITE PAPER This paper explains the research conducted by Minitab statisticians to develop the methods and data checks used in the Assistant in Minitab 17 Statistical Software. One-Way
Visualization methods for patent data
Visualization methods for patent data Treparel 2013 Dr. Anton Heijs (CTO & Founder) Delft, The Netherlands Introduction Treparel can provide advanced visualizations for patent data. This document describes
Visualization Quick Guide
Visualization Quick Guide A best practice guide to help you find the right visualization for your data WHAT IS DOMO? Domo is a new form of business intelligence (BI) unlike anything before an executive
Neovision2 Performance Evaluation Protocol
Neovision2 Performance Evaluation Protocol Version 3.0 4/16/2012 Public Release Prepared by Rajmadhan Ekambaram [email protected] Dmitry Goldgof, Ph.D. [email protected] Rangachar Kasturi, Ph.D.
Data Mining Applications in Fund Raising
Data Mining Applications in Fund Raising Nafisseh Heiat Data mining tools make it possible to apply mathematical models to the historical data to manipulate and discover new information. In this study,
IBM SPSS Direct Marketing 23
IBM SPSS Direct Marketing 23 Note Before using this information and the product it supports, read the information in Notices on page 25. Product Information This edition applies to version 23, release
KnowledgeSEEKER Marketing Edition
KnowledgeSEEKER Marketing Edition Predictive Analytics for Marketing The Easiest to Use Marketing Analytics Tool KnowledgeSEEKER Marketing Edition is a predictive analytics tool designed for marketers
KEYWORD SEARCH OVER PROBABILISTIC RDF GRAPHS
ABSTRACT KEYWORD SEARCH OVER PROBABILISTIC RDF GRAPHS In many real applications, RDF (Resource Description Framework) has been widely used as a W3C standard to describe data in the Semantic Web. In practice,
Big Data: Rethinking Text Visualization
Big Data: Rethinking Text Visualization Dr. Anton Heijs [email protected] Treparel April 8, 2013 Abstract In this white paper we discuss text visualization approaches and how these are important
IBM SPSS Direct Marketing 22
IBM SPSS Direct Marketing 22 Note Before using this information and the product it supports, read the information in Notices on page 25. Product Information This edition applies to version 22, release
Database Design Patterns. Winter 2006-2007 Lecture 24
Database Design Patterns Winter 2006-2007 Lecture 24 Trees and Hierarchies Many schemas need to represent trees or hierarchies of some sort Common way of representing trees: An adjacency list model Each
Data Mining Applications in Higher Education
Executive report Data Mining Applications in Higher Education Jing Luan, PhD Chief Planning and Research Officer, Cabrillo College Founder, Knowledge Discovery Laboratories Table of contents Introduction..............................................................2
