USING THE PREDICTIVE ANALYTICS FOR EFFECTIVE CROSS-SELLING Michael Combopiano Northwestern University Michael.Comobopiano@att.net Sunil Kakade Northwestern University Sunil.kakade@gmail.com Abstract--The decision tree classification algorithm may be used to determine which companies in a CRM system are likely and which are not likely to accept an offer for a product or service which has not yet been offered to them. This paper outlines the steps taken to run such a project with real CRM data. After many iterations of data preparation, a suitable model was attained. Though the final model did not make use of most of the attributes supplied, it generated very agreeable accuracy and ROC metrics. Furthermore, the resulting decision tree will be quite intuitive to nontechnical audiences, and the business value provided will be well-received. Lastly, the results of the model are logical from a business knowledge standpoint, and as such will not pose a challenge to implement and take action on in our production CRM and sales environment. Keywords: Weka, J48, decision tree, classification algorithm, predictive analytics, CRM I. INTRODUCTION The goal of this project is to examine customers and prospects of the BMO Harris Commercial and Business banks to ascertain which ones are likely to accept offers for cash management services. The Business Bank in general targets loans ( business credit ) to companies with $3MM to $20MM in annual sales, and the Commercial Bank segment is over $20MM. Each of these lines of business has a separate division, Treasury and Payment Solutions, to provide basic non-loan banking services such as business deposit accounts, business savings/money markets, receivables collection, and so forth. The vast majority of new sales opportunities are loans, not cash management services. For various reasons, we have found that many of our business loan customers and prospects have not been exposed to the sales process for our cash management services. We would like to determine which of this population would most likely respond positively to an offer for cash management services. Since we wish to analyze customers and prospects, an ideal data source for this project is our Oracle CRM On Demand system. ( CRM is Customer Relationship Management, which is our system of record for tracking all sales information, or pipeline management, and meaningful points of contact with our customers, prospects and referral sources.) However, this data is not without its challenges. The first and most formidable is that by its nature, pipeline and interaction (activity) data is subjective and in general does not lend itself well to downstream confirmation. Whereas we as system administrators can rather reliably detect and correct firmographic data entry errors (legal entity name, address, etc.), we cannot as readily detect and correct errors in data elements such as when a sales opportunity advanced from one stage to the next, or what was decided in a meeting with a client, or even when that meeting took place. The most reliable control we have over quality, and it is a strong one, is that we measure certain data elements as Key Performance Indicators which directly influence job performance appraisal and compensation. Therefore, since our CRM system provides information on customers and prospects, our two desired populations for study, and since we feel it is reasonably reliable, it makes a good data source for our goal of determining which customers and prospects are most likely to accept offers for cash management services. The actionable output of this project will be a classification value appended to each of our customer and prospect records in CRM who have not yet been offered our cash management services. This value will inform our sales team as to the relative likelihood of that company being receptive to our offer. The specific steps to be executed are as follows:
1. Extract records from Entities (companies) and Opportunities (sales pipeline) objects in Oracle CRM On Demand 2. Select data elements to be considered for use in Weka 3. Perform data cleanup and preparation for use in Weka 4. Determine which attributes will deliver the most value 5. Generate training, testing and production data (the latter is made up of the rows for which we desire a predicted value) 6. Build model: Run J48 classification algorithm with varied parameters with training/test data, choose best set of parameters and save model 7. Run model against production data, append predicted values and re-import back into CRM 8. Construct report in CRM to provide prospecting recommendations to sales team II. DATA UNDERSTANDING As mentioned in the introduction, all data used for this project came from our Oracle CRM On Demand system. The following table shows which fields (attributes) were chosen for extract from CRM, along with additional information about each field. CRM Table Field Type Description Entity Name Text Legal company name Entity Type Text Customer, prospect, etc. * Entity ID Text Key, unique ID Annual Sales Currency Company size in sales Last Contact Date Date Last recorded meeting/call Number of Employees Number Employee count Parent Entity Text Name of corporate parent Priority Text Top 10, Top 50, etc. State Text US state of company residence Annual Revenue Tier Text Annual Sales categories Entity SIC # Text OSHA "Standard Industrial Classification" TPS Sales Mgr Text Cash Management salesperson BMO Relationship Role Text Lead Bank, Participant, etc. Lead Bank Text If not BMO, name of competitor bank LOB Text Line of Business LOB Segment Text Division within line of business Entity Primary Text CRM entity record owner, key salesperson Number of Activities Number Count of all activities (meetings/calls) Number of Contacts Number Count of attached contact (person) records Number of Opportunities Number Count of attached opportunities Number of Wins Number Count of closed/won opportunities Opportunity * Opportunity ID Text Key, unique ID Sales Stage Text Pitched, Closed/Won, Closed/Lost, etc. Category Name Text Loans, deposits, cash management, etc. * Entity ID Text Foreign key
Sales Stage History Table 1: CRM * Opportunity ID Text Foreign key Sales Stage Text Pitched, Closed/Won, Closed/Lost, etc. # of Days in Stage Number Count of days in current sales stage These three tables were loaded into Microsoft Access, where queries were used to concatenate into one table and augment that table with additional derived information as described in the next section, Data Preparation. III. DATA PREPRATATION The above three tables were combined in Microsoft Access into one table for use in Weka (J48 algorithm). Table 1 describes this file when it was complete in Microsoft Access. The third column provides pseudocode for fields that were derived from other fields. * TPS stands for Treasury and Payment Systems, which is the name for our cash management line of business. The TPS Status field indicates historically whether a prospect/customer has been offered these services, and whether or not the offer was accepted. The last field, Likely to Buy TPS is the class attribute in Weka. When this data set was initially loaded into Weka, there were a number of issues, primarily caused by forbidden characters. Some of the observations and lessons learned include the following: The SIC field required a lot of work. SIC is an industry classification denoting the type of business the entity specializes in. For example, Testa Produce, Inc., a Chicago-area concern, has SIC number 5148 with a description of Fresh fruits and vegetables. (These designations are codified in the USA by OSHA.) Such a data element held great promise as an attribute used to predict likelihood to purchase cash management services. Fortunately, the multitude of possible values can be categorized by their first two digits into eleven values, which was done. These fields were removed as they were either only intended to be used to derive other more useful attributes or were represented by other retained attributes: o o o o Annual Sales (better represented by Annual Revenue Tier since this has far fewer values and is just as useful in terms of describing company size) Days Since Last Contact (better represented by Years Since Last Contact ) Days in Stage (this is the number of days a sales opportunity has been in its current stage, and it was used to calculate TPS Status ; as a standalone field it intuitively is not useful as a predictor in this model) Entity Name and Entity ID were removed as they add no value to the predictive process With the fields chosen, the next step was to divide the data into two sets: a training and testing set, and the set for which target values were desired in the class field Likely to Buy TPS. The training/testing set contains 15,775 rows and the to-be-determined data set has 143,309 rows (there were 159,088 rows in total.) To extract the training and testing set, all rows were pulled where Likely to Buy TPS had a Y or an N value. Having developed the training and testing set, the next step was to determine which attributes would provide the most value in the chosen predictive model. To help narrow the field of candidate attributes Table 3, domain knowledge was combined with experience and knowledge of the J48 algorithm to arrive at the first set of candidate attributes. Table 3 was loaded into Weka for the initial run of the J48 algorithm. IV. DATA MINING ALGORITHM For this project, the J48 classification algorithm in Weka was chosen as it is a perfect fit for determining classification for given instances. In this project, we seek to classify each instance as likely or not likely to accept an offer for cash management (TPS) services. This is a decision tree algorithm, a very powerful
predictive tool with the following characteristics to recommend it: It is rather intuitive and readily understandable by non-technical audiences It is adept at working with multiple types of data (numerical, nominal)
MS Access Field Type Source or Access Pseudocode Entity Name Entity Type Entity ID Numeric Annual Sales Numeric Days Since Last Contact Numeric Now() - [Entity].[Last Contact Date] Years Since Last Contact Numeric Round([Entity].[Days Since Last Contact] / 365, 1) Number of Employees Numeric Has Parent Entity = "Y" if [Entity].[Parent Entity] Is Not Null Priority State Annual Revenue Tier SIC SIC 2- Digit = Left([Entity].[SIC],2) SIC Category (from SIC table from OSHA website matched to SIC 2- Digit) SIC Description Has TPS Sales Mgr = "Y" if [Entity].[TPS Sales Mgr] Is Not Null BMO Relationship Role Lead Bank Lead Bank Categorized Retained top 20 values, changed all remaining to "Other" LOB- New LOB Segment Entity Primary Number of Activities Numeric Number of Contacts Numeric Number of Opportunities Numeric Number of Wins Numeric Days in Stage Numeric = [Sales Stage History].[# of Days in Stage] = "Success" if [Opportunity].[Category Name] = "Cash Management" and [Opportunity].[Sales Stage] In ("Closed/Won","04 - Engaged","05 - Implementation") = "Fail" if [Opportunity].[Category Name] = "Cash Management" and [Opportunity].[Sales Stage] In ("Closed/Lost","08 - Decline","09 - Inactive","03 - On Hold") TPS Status* = "Fail" if [Opportunity].[Category Name] = "Cash Management" and [Opportunity].[Sales Stage] In ("00 - Long Term Prospect","01 - Identified Needs","02 - Pitched /Proposed") AND [Sales Stage History].[# of Days in Stage] > 365 = "In Progress" if [Opportunity].[Category Name] = "Cash Management" and [Opportunity].[Sales Stage] In ("00 - Long Term Prospect","01 - Identified Needs","02 - Pitched /Proposed") AND [Sales Stage History].[# of Days in Stage] < 180 = "Not Attempted" if [TPS Status] Is Null Has Business Credit = "Y" if [Opportunity].[Category Name] = "Business Credit" Likely to Buy TPS = "Y" if [TPS Status] = "Success" = "N" if [TPS Status] = "Fail" Table 2: MS
Field/Attribute Comments and Intuitive Observations Decision TPS Status d to set historical value in class attribute Likely to Buy TPS for training data, therefore redundant as a Remove training attribute Entity ID Not useful as a predictor Remove Number of Wins Intuitively very valuable as an indicator of likelihood to accept offers Entity Type Intuitive that current customers should be more receptive to offers Entity Primary Categorized by LOB Segment will be useful when running the model for each LOB or LOB segment Remove LOB Segment Intuitive that some sales teams are better at cross- sell than others Days Since Last Contact Redundant: Years Since Last Contact is a better choice Remove Number of Activities Same as above Years Since Last Contact Same as above LOB Redundant: Better served by LOB Segment Remove Number of Opportunities Possible indicator of quality and depth of sales relationship State Better represented by "LOB" and "LOB Segment" Remove Number of Contacts Possible indicator of depth of relationship, though "count of activities per contact" would be better Remove Lead Bank Categorized Some competitors should be easier to win cash management business from SIC Category Possible that some industries are more dependent on cash management services Priority Though arbitrary and subjective, high- priority customers and prospects should correlate with higher likelihood of accepting offers BMO Relationship Role Where BMO is lead bank, should increase likelihood of accepting offers Annual Revenue Tier Unknown if this would be a factor in likelihood to accept cash mgt. offers - clustering would be helpful to better understand Has TPS Sales Mgr Probably not useful as an indicator, not reliably provided in CRM Remove Number of Employees Unknown if this would be a factor in likelihood to accept cash mgt. offers - clustering would be helpful to better Remove understand Has Parent Entity Unknown if this would be a factor in likelihood to accept cash mgt. offers - clustering would be helpful to better understand Remove
Has Business Credit Table 3: Attributes "Y" indicates customer has a loan, probably not useful as a predictor since other attributes accomplish the same goal ("TPS Status", "Entity Type") and rank far higher Remove It is also adept at accommodating missing values, a key problem in many data sets A standard home personal computer offers outstanding performance for this algorithm (this project ran in 0.2 seconds) Simply stated, a decision tree resembles a flowchart showing a starting point ( root node ), decisions made, internal stages as a result of the decisions ( internal nodes ), and final end points or final decisions ( leaf nodes ). The following graphic is an annotated copy of the actual decision tree, represented graphically, from this project: Decision tree algorithms are able to work with the following data characteristics: Numeric: Numbers, currency, etc. Binary: Yes/No, Buy/Not Buy, Send/Don t send, etc. Dates: (self-explanatory) : Names of categorical values such as Customer, Prospect, etc. Note that the class variable (more on this later) must be nominal, not date nor numeric. Unary: Represents a numerical value. Examples are the use of hash marks or Roman numerals. Null: A very useful function is that most (if not all) decision tree algorithms, including the Weka J48, can accommodate missing values. In order for decision tree results to be understandable, useful, and transportable to real business need, the tree itself must be kept down to a reasonable size. The J48 algorithm will assist, but the human operator must also contribute by ensuring the data is well-suited to this goal. Some of the characteristics to aim toward are as follows: Since Weka uses Java, reserved characters must be removed, such as those that are found on the numeric keys above the letters on a computer keyboard. Aggregation: Fields that have too many values will create too many nodes in a decision tree, and should be collapsed or aggregated if possible. An example in this project is the collapse of SIC (industry descriptions) into eleven categories. Appropriate Dimensionality and Feature Creation: A reader of this paper who is experienced in decision tree execution might have been alarmed by the initial quantity of fields chosen for extraction from the CRM system, and rightly so. These fields were initially extracted as they were thought to have some potential value to the predictive process, but many were used only for derivation of other fields ( SIC was used to derive SIC Category ), and some, upon further reflection, were decided to have too little or no value to the predictive process, and so were removed. As mentioned, the Weka J48 decision tree algorithm also helps ensure a reasonably-sized decision tree. Decision tree models do so by determining the most efficient combination of attributes and where to split them according to their values. The two steps are as follows:
1. Select an attribute to represent the root (starting) node, build an outbound path and node for each possible value 2. Continue splitting each node until leaf or end nodes are reached (a leaf node occurs when all values of that attribute are the same within that node) Following the precepts of Hunt s Algorithm, the J48 algorithm performs this task many times over until the optimal model is derived Determination of the optimal model is accomplished by measuring the degree of purity or homogeneity of values contained within a node. A node displays perfect purity or homogeneity if all values are the same. A node does not have to display perfect purity to become a leaf node, and the model will keep trying until the aggregate purity of all leaf nodes is as high as possible. Three methods widely employed by decision tree algorithms to determine the optimal degree of purity of each node are: GINI Index: Lower value is better Degree of Entropy: Lower value is better Information Gain: Higher value is better Please see Appendix A for descriptions and illustrations of these three measures. A visual review of the resulting decision tree is one measure of the degree of success of each iteration of the J48 model. (Multiple iterations are strongly recommended.) Another visual representation of degree of success is called the confusion matrix. The confusion matrix is simply a table showing the counts of true positives, true negatives, false positives and false negatives. Generally speaking, the predicted values are shown on the X axis and the actual values found by the model are arranged on the Y axis. The following is an annotated version of the confusion matrix generated by the Weka J48 model for this project: Actual Y Actual N Predicted Y 9,706 = True Positive 997 = False Positive Predicted N 235 = False Negative 4,837 = True Negative True Negative: Instances where the model predicted a value of N and the actual value was N False Positive: The model predicted a value of Y but the actual value was N False Negative: The model predicted a value of N but the actual value was Y From the confusion matrix, these metrics may be derived and used to ascertain the overall applicability or desirability of the model: Accuracy: Accuracy is expressed as (True Positives + True Negatives) / (SUM (all four values) Precision: Expressed as (True Positives) / (True Positives + False Positives) Recall: Expressed as (True Positives) / (True Positives + False Negatives) These measures are suitable for attributes with even data distribution, but are very misleading when values are heavily skewed toward one value. (The accuracy value will be high, but the model will not have assigned any class values for the under-represented class.) Because of this, a much more meaningful measure for determining success an applicability of a model is the Receiver Operating Characteristic ( ROC ) curve. The ROC curve is a visual representation of the data points in a two-dimensional graph. Each classification is represented with in the curve. True Positive values are plotted on the Y axis and False Positives are shown on the X axis. The ROC value represents the area between the curve and the x axis, or the area under the curve. A higher value is more desirable, and as such, the goal is to choose a point on the curve, nearest the upper left quadrant as possible, that represents the point that provides the highest possible count of true positives with a minimum of false positives, or the highest tolerable count of false positives. (In our sales application, false positives are more acceptable that they would be in a medical study where a false positive might lead to invasive, costly and unnecessary procedures.) These definitions apply to the confusion matrix: True Positive: Instances where the model predicted a value of Y and the actual value was in fact Y
Following is the actual ROC curve for this project: We see in this graph that there is a very concentrated cluster directly in the desired area, circled in red in the upper left quadrant. This indicates an overwhelming concentration of the desired outcome of as many true positives as possible with the smallest count of false positives. This too is reflected in the very high value of 0.9283 for the Area Under the Curve. This concept will be revisited in the following section wherein the success and suitability of the model will be examined. V. EXPERIMENTAL RESULTS AND ANALYSIS This section will discuss findings having executed the Weka J48 classification algorithm with the following characteristics: The algorithm used was weka.classifiers.trees.j48 C 0.25 M 2 There were 15,775 instances (rows of data) with 12 (including class attribute) attributes as described previously in the section titled Data Preparation The test mode chosen was 10-fold crossvalidation The model constructed a tree with five leaf (end-point) nodes, with 9 nodes in total. This is a very reasonablysized tree, though it is interesting to note that very few of the attributes appear in the tree. A practical interpretation of this outcome is that the chosen attributes work very well within a predictive model for this set of data, but it would be highly advisable to try different combinations of attributes on smaller subsets of these 15,775 instances. The decision tree is as follows: Weka also provides a text display of the decision tree: # of Wins <= 0: N (4856.0/137.0) # of Wins > 0 # of Wins <= 1 # of Opportunities <= 2: Y (3441.0/149.0) # of Opportunities > 2 # of Opportunities <= 4: Y (523.0/204.0) # of Opportunities > 4: N (222.0/85.0) # of Wins > 1: Y (6733.0/625.0) The left figure within a set of parentheses is the count or weight of instances that would up in that leaf, and the right figure is the count of misclassified instances. If there are digits to the right of the decimal point, this signifies that there were missing data elements. A prose interpretation of this decision tree might be presented thusly: 1. If a company/entity has no closed/won opportunities (they have accepted none of our offers), then for purposes of this sales campaign they should be ignored as they are highly likely to reject our offers for cash management services. This represents approximately 33% of our examined records. 2. However, if a company has 1 or more closed/won opportunities, we should offer cash management services as there is a very good chance of acceptance. In fact, if the count of closed/won opportunities is greater than one, there is a 91% chance of acceptance according to this model. 3. If there is exactly one closed/won opportunity, then we next consider the total number of opportunities pitched to this company. If the
count is less than two, then we have a 95% chance of acceptance of a cash management services offer. 4. If there are more than 2 opportunities but less than 4, chances of acceptance are 60% 5. However, if there are more than 4 opportunities, chances of rejection are 62%. Since this is only 1.4% of our examined population, these should be ignored along with the first group. Given the above interpretation, a suitable recommendation would be to include groups 2 and 3 in a premier campaign to offer cash management services. This encompasses 10,174 companies, or 64% of our sample. Group 4, encompassing 523 companies or 3.3% of our sample, should be in a secondary campaign after the premier campaign is complete. But how trustworthy is this decision tree? To start, Weka provides these metrics: Metric Count Ratio Correctly Classified Instances 14,543 92.19% Incorrectly Classified Instances 1,232 7.81% Total Number of Instances 15,775 Certainly these top two metrics are very encouraging, showing an accuracy rating of 0.9219. But recalling that accuracy may not be a good measure of success, we should instead consider our ROC value (see the actual ROC graph on page 13). Since this value is 0.928, we can be very certain that this is a successful model for this extract of data. (See Appendix B for more of the text output from Weka.) VI. CONCLUSION This project sought to construct a reliable predictive model to classify whether or not a company in our CRM system would be likely or not likely to accept an offer for cash management services. To accomplish this, a sample set was extracted from CRM where it was determined whether or not a company had already been offered cash management services, and whether or not that company accepted the offer. After several iterations of data preparation, the Weka J48 algorithm eventually produced a model with very respectable metrics; 92% accuracy and a ROC value of 0.928. Additionally, the decision tree, though it doesn t use many of the metrics provided, is very simple, quite intuitive, passes logical examination by a domain expert, and will be quite practical to implement in a production environment. VII. FUTURE WORK Assuming the findings of this project are agreeable, the immediate next step is to re-run the model for the remaining 143,309 CRM records of companies that have not yet been offered cash management services. Once this is done, the results of the decision tree will be loaded into CRM and a campaign will be initiated to target the appropriate companies as indicated by the results of the model. These targeted companies will be flagged in CRM so that over time sales effectiveness may be compared between the subjects of this project and those that preceded the project. REFERENCE Tan, Pang-Ning, Michael Steinbach, and Vipin Kumar. Introduction to Data Mining. Boston: Pearson Addison Wesley, 2005. Provost, Foster and Fawcett, Tom, Data Science for Business: What You Need to Know About Data Mining and Data-Analytic Thinking. Sebastopol, CA: O Reilly Media, Inc., 2013