Segmentation For Insurance Payments Michael Sherlock, Transcontinental Direct, Warminster, PA



Similar documents
Data Mining: An Overview of Methods and Technologies for Increasing Profits in Direct Marketing. C. Olivia Rud, VP, Fleet Bank

Modeling Lifetime Value in the Insurance Industry

Cool Tools for PROC LOGISTIC

11. Analysis of Case-control Studies Logistic Regression

Statistics, Data Analysis & Econometrics

Developing Risk Adjustment Techniques Using the System for Assessing Health Care Quality in the

ln(p/(1-p)) = α +β*age35plus, where p is the probability or odds of drinking

Using An Ordered Logistic Regression Model with SAS Vartanian: SW 541

Basic Statistical and Modeling Procedures Using SAS

Customer Profiling for Marketing Strategies in a Healthcare Environment MaryAnne DePesquo, Phoenix, Arizona

SUGI 29 Statistics and Data Analysis

Predicting Customer Churn in the Telecommunications Industry An Application of Survival Analysis Modeling Using SAS

PROC LOGISTIC: Traps for the unwary Peter L. Flom, Independent statistical consultant, New York, NY

Modeling Customer Lifetime Value Using Survival Analysis An Application in the Telecommunications Industry

Online Appendix to Are Risk Preferences Stable Across Contexts? Evidence from Insurance Data

Methods for Interaction Detection in Predictive Modeling Using SAS Doug Thompson, PhD, Blue Cross Blue Shield of IL, NM, OK & TX, Chicago, IL

This can dilute the significance of a departure from the null hypothesis. We can focus the test on departures of a particular form.

A Basic Guide to Modeling Techniques for All Direct Marketing Challenges

Analysis of Survey Data Using the SAS SURVEY Procedures: A Primer

VI. Introduction to Logistic Regression

An Application of the Cox Proportional Hazards Model to the Construction of Objective Vintages for Credit in Financial Institutions, Using PROC PHREG

Improved Interaction Interpretation: Application of the EFFECTPLOT statement and other useful features in PROC LOGISTIC

Solving Insurance Business Problems Using Statistical Methods Anup Cheriyan

Easily Identify Your Best Customers

Generalized Linear Models

A LOGISTIC REGRESSION MODEL TO PREDICT FRESHMEN ENROLLMENTS Vijayalakshmi Sampath, Andrew Flagel, Carolina Figueroa

Credit Risk Analysis Using Logistic Regression Modeling

Logistic (RLOGIST) Example #3

Data Mining Techniques Chapter 4: Data Mining Applications in Marketing and Customer Relationship Management

STATISTICA Formula Guide: Logistic Regression. Table of Contents

Detecting Spam. MGS 8040, Data Mining. Audrey Gies Matt Labbe Tatiana Restrepo

Predicting Recovery Rates for Defaulting Credit Card Debt

SAS Software to Fit the Generalized Linear Model

Abbas S. Tavakoli, DrPH, MPH, ME 1 ; Nikki R. Wooten, PhD, LISW-CP 2,3, Jordan Brittingham, MSPH 4

ABSTRACT INTRODUCTION

COMPARISONS OF CUSTOMER LOYALTY: PUBLIC & PRIVATE INSURANCE COMPANIES.

An Introduction to Statistical Tests for the SAS Programmer Sara Beck, Fred Hutchinson Cancer Research Center, Seattle, WA

Adobe Analytics Premium Customer 360

Product recommendations and promotions (couponing and discounts) Cross-sell and Upsell strategies

SUMAN DUVVURU STAT 567 PROJECT REPORT

Statistics and Data Analysis

Improving the Performance of Data Mining Models with Data Preparation Using SAS Enterprise Miner Ricardo Galante, SAS Institute Brasil, São Paulo, SP

Redefining Measurement from Awareness to Conversion. Smart Market: Vol. 4 Data-Driven Marketing, Demystified

Overview Classes Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7)

A Property & Casualty Insurance Predictive Modeling Process in SAS

Easily Identify the Right Customers

Examining a Fitted Logistic Model

STATISTICA. Financial Institutions. Case Study: Credit Scoring. and

The Science and Art of Market Segmentation Using PROC FASTCLUS Mark E. Thompson, Forefront Economics Inc, Beaverton, Oregon

Alex Vidras, David Tysinger. Merkle Inc.

The Association and Affinity Marketplace: Expanding Business Opportunities By Understanding Member Preferences by Association Type

ABSTRACT INTRODUCTION STUDY DESCRIPTION

5.2 Customers Types for Grocery Shopping Scenario

Role of Customer Response Models in Customer Solicitation Center s Direct Marketing Campaign

Students' Opinion about Universities: The Faculty of Economics and Political Science (Case Study)

Insurance Analytics - analýza dat a prediktivní modelování v pojišťovnictví. Pavel Kříž. Seminář z aktuárských věd MFF 4.

A Handbook of Statistical Analyses Using R. Brian S. Everitt and Torsten Hothorn

Target and Acquire the Multichannel Insurance Consumer

An Example of SAS Application in Public Health Research --- Predicting Smoking Behavior in Changqiao District, Shanghai, China

Gamma Distribution Fitting

Study into the Sales of Add-on General Insurance Products

Binary Logistic Regression

Reevaluating Policy and Claims Analytics: a Case of Non-Fleet Customers In Automobile Insurance Industry

Combining Linear and Non-Linear Modeling Techniques: EMB America. Getting the Best of Two Worlds

Logistic Regression.

Understanding Characteristics of Caravan Insurance Policy Buyer

STATISTICA. Clustering Techniques. Case Study: Defining Clusters of Shopping Center Patrons. and

IBM SPSS Direct Marketing

How to set the main menu of STATA to default factory settings standards

Paper D Ranking Predictors in Logistic Regression. Doug Thompson, Assurant Health, Milwaukee, WI

Chapter 29 The GENMOD Procedure. Chapter Table of Contents

Three proven methods to achieve a higher ROI from data mining

IBM SPSS Direct Marketing 23

Direct Marketing Profit Model. Bruce Lund, Marketing Associates, Detroit, Michigan and Wilmington, Delaware

Getting Correct Results from PROC REG

IBM SPSS Direct Marketing 22

Variable Selection in the Credit Card Industry Moez Hababou, Alec Y. Cheng, and Ray Falk, Royal Bank of Scotland, Bridgeport, CT

Predictive Modeling Using Transactional Data

General Method: Difference of Means. 3. Calculate df: either Welch-Satterthwaite formula or simpler df = min(n 1, n 2 ) 1.

Categorical Data Analysis

USING LOGISTIC REGRESSION TO PREDICT CUSTOMER RETENTION. Andrew H. Karp Sierra Information Services, Inc. San Francisco, California USA

Survey, Statistics and Psychometrics Core Research Facility University of Nebraska-Lincoln. Log-Rank Test for More Than Two Groups

Lecture 18: Logistic Regression Continued

Multinomial and Ordinal Logistic Regression

IBM SPSS Statistics 20 Part 4: Chi-Square and ANOVA

Analytics: A Powerful Tool for the Life Insurance Industry

Data Mining - Evaluation of Classifiers

Tutorial Segmentation and Classification

Factors affecting online sales

Transcription:

Segmentation For Insurance Payments Michael Sherlock, Transcontinental Direct, Warminster, PA ABSTRACT An online insurance agency has built a base of names that responded to different offers from various carriers, some of whom have purchased one or more insurance products. The goal of this project is to analyze response and sales data to understand relationship among products, offers, and payments for a single carrier to deploy predictive models and segmentation schemes to target best prospects for acquisition or cross-sell through a multi-channel communication strategy. Individual information was overlaid with demographic and psychographic data. A logistic regression model was developed to score individuals based on their propensity to convert (i.e. pay for the issued policy). These records were then segmented using a clustering technique, and distinct, descriptive groups were produced to alter marketing language. Also, a cross-sell matrix was developed to identify products that the same carrier may offer to existing customers. INTRODUCTION A single carrier s (referred to here as carrier ) Accidental Death and Dismemberment (ADD) product is sold through two different online portals (URLs). Each portal is distinct, despite the fact that the product is identical. Aesthetically, the sites are different; however, more significantly, the offers vary. Portal A features a deviated premium offer ($1 for the first month of coverage) while Portal B features a bonus offer (free $10,000 policy for a limited time). The carrier is having a problem with nonpayment of premium. The portals are reaching their traffic and signup goals, but the carrier is disappointed with the proportion of individuals that make their first payment (i.e. conversion). The goal of this project is to identify those most likely to convert and compare them to those least likely to convert. By utilizing this information, the carrier may adjust their creative and placement strategy as well as their offerings. APPROACH After data hygiene, application and purchase files were matched together to yield a single master file of 16,456 records. Individuals that were denied coverage were removed from the analysis. 155 demographic and psychographic variables were matched to the file which was then loaded into SAS for analysis. The seven most predictive variables that may be gathered during the online application process were used. Since all records visited a site and requested a policy, a logistic regression model was built to predict which records are most likely to pay for the policy after issue. A series of models were built with a variety of variables, and the best performing one was selected. The most predictive fields were: face value of the policy, payment method, age, types of credit cards owned, household income, home ownership, and length of residence. Other variables did show some predictability, but the model was reduced for parsimony. MODEL CODE The code below represents how the final logistic regression model was built. The various iterations of the model building process are omitted here. ODS HTML; ODS GRAPHICS ON; PROC LOGISTIC DATA = client.carrier_final; WHERE issue = 1 & free = "N"; CLASS homeowner cc payment / PARAM=REF REF = FIRST MISSING; MODEL paid(event = '1') = faceamount payment agecode cc medianhhincome lor*homeowner / RSQUARE IPLOTS CLPARM = BOTH LACKFIT CORRB COVB NODUMMYPRINT STB; OUTPUT OUT = client.carrier_logit1 PREDICTED = pred1 PREDPROBS = individual; GRAPHICS DFBETAS ROC ESTPROB; ODS GRAPHICS OFF; ODS HTML CLOSE; ODS HTML file='c:\documents and Settings\msherlock\My Documents\My SAS 1

Files\CLIENT\CLIENT_OUT\carrierlogit1.html'; PROC PRINT DATA = client.carrierlogit1; VAR ID issue free paid faceamount payment agecode cc medianhhincome lor homeowner _FROM INTO_ IP_0 IP_1 _LEVEL_ pred1; ODS HTML CLOSE; This code builds the logistic regression model to predict where payment = 1; that is, the issued policy was converted into a paid policy. Logistic regression defaults to the lowest number (in this case, 0) so the software must be explicitly told to model for a 1. Nominal variables (homeowner, credit card type, and payment method) are included in the class statement for SAS to automatically produce dummy variables. An interaction variable (lor*homeowner) was used to only include length of residence for homeowners. The OUT = statement is used to produce an output scored dataset. An OUTMODEL = statement was also included originally to score a hold-out dataset to confirm model validity. The variables used in this model were selected after multiple iterations of running the logistic procedure and selecting the best performing model. MODELING CONCLUSIONS Method of payment is by far the most predictive element of the model. Those paying by credit card are four-times more likely to complete the transaction. Those paying by a bank draft (a.k.a. electronic fund transfer, or EFT) are twice as likely. Those requesting a bill to be sent are 53% less likely to pay. Although method of payment is the most predictive, it is not the only factor. Besides, one does not know the method of payment before someone pays. This model seeks to identify those most likely to pay before the payment option is selected, thereby identifying those that may need more incentive to pay. By mining the data, it was discovered that 23% of those that requested a bill and subsequently did not pay it do indeed possess a bank issued credit card. 26% of those that offered an EFT method of payment and did not pay their bill also have a bank issued credit card. 78% of those that select the credit card payment option complete the transaction. Nearly one out of every four invoice and EFT non-payers were able to pay by credit card. If guided into this payment method, the overall completed transactions would be significantly increased. Over one third (34%) of those selecting EFT do not pay. 26% of those people have a credit card. It is known that 78% of credit card users pay. Meaning, one may increase paid transactions by 7% (34% * 26% * 78% = 7%) by encouraging EFT users to credit card payment. Nearly three-quarters (71%) of invoices go unpaid. It is known that 23% of these people possess a credit card. Again, 78% of credit card users pay. Meaning, one may increase paid transactions by 13% (71% * 23% * 78% = 13%) by encouraging invoice requestors to use a credit card. 2

MODEL OUTPUT Model Fit Statistics Intercept Only Intercept and Covariates AIC 19331.39 16975.56 SC 19338.96 17096.68-2 Log L 19329.389 16943.555 R-Square 0.1534 Max-rescaled R-Square 0.2071 Testing Global Null Hypothesis: BETA=0 Test Chi-Square DF Pr > ChiSq Likelihood Ratio 2385.8338 15 <.0001 Score 2335.4160 15 <.0001 Wald 1956.3615 15 <.0001 Type 3 Analysis of Effects DF Wald Chi-Square Pr > ChiSq faceamount 1 27.5758 <.0001 payment 3 1670.4991 <.0001 agecode 1 16.1692 <.0001 cc 7 54.8414 <.0001 medianhhincome 1 41.8986 <.0001 LOR*homeowner 2 6.1830 0.0454 Analysis of Maximum Likelihood Estimates Parameter DF Estimate Standard Error Wald Chi- Square Pr > ChiSq Standardized Estimate Intercept 1-0.5291 0.1129 21.9639 <.0001 faceamount 1-0.00177 0.000337 27.5758 <.0001-0.0548 Payment (EFT) b 1 0.7804 0.1038 56.4937 <.0001 Payment (Credit) c 1 1.3786 0.0969 202.4303 <.0001 Payment (Invoice) i 1-0.7514 0.0832 81.5481 <.0001 agecode 1 0.0334 0.00830 16.1692 <.0001 0.0455 cc 1 1 0.1363 0.0568 5.7661 0.0163 cc 2 1 0.1803 0.0719 6.2935 0.0121 cc 3 1 0.2693 0.0573 22.0917 <.0001 cc 4 1-0.1249 0.2672 0.2184 0.6402 cc 5 1 0.8069 0.1881 18.3977 <.0001 cc 6 1-0.6402 0.3972 2.5975 0.1070 cc 7 1 1.0041 0.2750 13.3323 0.0003 medianhhincome 1 0.0809 0.0125 41.8986 <.0001 0.0691 LOR*homeowner 1 1-0.0150 0.00843 3.1776 0.0747 LOR*homeowner 2 1 0.00369 0.00324 1.2961 0.2549 3

Odds Ratio Estimates Point Estimate 95% Wald Confidence Limits faceamount 0.998 0.998 0.999 payment b vs 2.182 1.780 2.675 payment c vs 3.969 3.283 4.800 payment i vs 0.472 0.401 0.555 agecode 1.034 1.017 1.051 cc 1 vs 0 1.146 1.025 1.281 cc 2 vs 0 1.198 1.040 1.379 cc 3 vs 0 1.309 1.170 1.465 cc 4 vs 0 0.883 0.523 1.490 cc 5 vs 0 2.241 1.550 3.240 cc 6 vs 0 0.527 0.242 1.148 cc 7 vs 0 2.729 1.592 4.679 medianhhincome 1.084 1.058 1.111 Association of Predicted Probabilities and Observed Responses Percent Concordant 70.6 Somers' D 0.418 Percent Discordant 28.8 Gamma 0.420 Percent Tied 0.6 Tau-a 0.201 Pairs 49424012 c 0.709 The ROC Curve below illustrates the sensitivity (probability of a false positive) versus 1- specificity (inverse of the probability of a false negative). On this curve, the rapid climb shown on the left-hand side shows that this model is predicting policy conversion well. The estimated area under the curve (C) is approximately 0.71. If it were 0.5, the resulting curve would be a straight diagonal line; meaning the model would only be predicting well 50% of the time. 4

DECILE RESULTS The file was deciled by applying the scoring algorithm to all records. The file was then split into ten portions of equal size to gauge the lift realized by applying said model. Payment Rate Cumulative % paid % of file % paid % of file Decile 1 20% 10% 20% 10% Decile 2 18% 10% 38% 20% Decile 3 12% 10% 50% 30% Decile 4 9% 10% 59% 40% Decile 5 8% 10% 67% 50% Decile 6 7% 10% 75% 60% Decile 7 7% 10% 82% 70% Decile 8 7% 10% 88% 80% Decile 9 6% 10% 95% 90% Decile 10 5% 10% 100% 100% The graph below further illustrates how the predicted payment changes as one goes deeper into the file by decile: Predicted Payment 90% 80% 70% 60% 50% 40% 30% 20% 10% 0% Decile 1 Decile 2 Decile 3 Decile 4 Decile 5 Decile 6 Decile 7 Decile 8 Decile 9 Decile 10 The model output and analysis above are included to illustrate the validity of the model. Based on demographic information as well as payment type, one may predict the likelihood of payment. Using this information alone, the carrier has a general idea of the prospects that will most likely result in payment. To further hone the carrier s strategy, a segmentation analysis was produced to group prospects into those most likely to pay and those least likely to pay. This analysis follows. 5

SEGMENTATION After the predictive model was built, the scored records were grouped to identify key clusters of good pay and bad pay individuals. The process resulted in twelve distinct clusters where records perform similarly to one another within the cluster and dissimilarly to records in other clusters. Interestingly, only 24% of the final file contained records that were homeowners; however, three of the four best pay clusters were all 100% homeowners. SEGMENTATION CODE Segmentation is a multi-step process. A number of iterations were performed here to produce the best fitting segments. The procedures that follow here are the results of the best group of variables. /*Step 1 Approximate Covariance Estimation (ACE) for Clustering*/ PROC ACECLUS DATA = client.carriersegment OUT = client.carrierlogit1aceclus OUTSTAT = client.carrierlogit1aceclusstat PROPORTION =.03 PP; VAR pred1 medianhhincome agecode renter; /*Step 2 Clustering*/ PROC FASTCLUS DATA = client.carrierlogit1aceclus MAXCLUSTERS=12 MAXITER=10 OUT = client.carrierlogit1fastclus; VAR Can1 Can2 Can3 Can4; /*Step 4 Descriptives*/ ODS HTML file='c:\documents and Settings\msherlock\My Documents\My SAS Files\CLIENT\CLIENT_OUT\Carrier_clusters1.html'; PROC UNIVARIATE DATA = client.carrierlogit1fastclus PLOTS; VAR pred1 paid medianhhincome agecode renter bankcard retail; CLASS cluster (ORDER = INTERNAL); ODS HTML CLOSE; /*Additional Descriptives*/ ODS HTML file='c:\documents and Settings\msherlock\My Documents\My SAS Files\CLIENT\CLIENT_OUT\Carrier_clusters2.html'; PROC UNIVARIATE DATA= client.carrierlogit1fastclus plots; VAR bankdraft creditcard invoice faceamount lor sqinsurl psurl; CLASS cluster (ORDER = INTERNAL); ODS HTML CLOSE; The ACECLUS procedure produces a series of canonical variables based on the input variables. The resulting dataset is then run through the FASTCLUS procedure to actually group the records together based on said canonical variables. Once this task is complete, the clusters are analyzed by the original input variables as well as other variables to produce a full view of what these records look like. The procedure output is omitted here. But what follows is the analysis of the UNIVARIATE procedures grouped by the various cluster segments with the comparison to similar clusters shown. 6

THE RENTERS Cluster 2 Cluster 3 Overall Predicted Payment 71% 26% 40% Income $30-$39,999 $30-$39,999 $30-$39,999 Age 35-39 25-29 40-44 Home Owner 100% renters 100% renters 24% renters Gender 53% male 38% male 54% male Marital Status 34% married 30% married 68% married Length of Residence Under 5 years Under 5 years Under 12 years Face Value of Policy $118,100 $136,400 $132,700 Dev. Prem / Bonus Off 16% / 67% 12% / 74% 13% / 66% EFT / Credit Card 35% / 62% 0% / 0% 7% / 14% Cluster 2 was nearly three-times more likely to pay than cluster 3, despite the fact that they had many commonalities. Both clusters had similar household income, they rent, and have lived in the same domicile for up to five years. The good pay group, cluster 2, had more men than cluster 3; a proportion more on par with the whole sample. Cluster 2 was somewhat older than cluster 3, 35-39 versus 25-29, respectively. Also, cluster 2 tended to go for policies with lower face values than cluster 3, resulting in lower monthly premiums. THE HOMEOWNERS Cluster 5 Cluster 7 Overall Predicted Payment 80% 32% 40% Income $40-$49,999 $40-$49,999 $30-$39,999 Age 55-59 35-39 40-44 Home Owner 100% own 100% own 76% own Gender 68% male 60% male 54% male Marital Status 76% married 53% married 68% married Length of Residence 6+ years 2-11 years Under 12 years Face Value of Policy $124,500 $139,300 $132,700 Dev. Prem / Bonus Off 18% / 60% 13% / 65% 13% / 66% EFT / Credit Card 24% / 73% 0% / 0% 7% / 14% Both of these groups own their homes and have an annual household income of $40,000-$49,999. They both tend to be married males. Cluster 5, which is more than twice as likely to pay, is significantly more likely to have older, married men than cluster 7. Also, Cluster 5 tends towards less expensive policies. 7

THE UPPER-MIDDLE Cluster 12 Cluster 10 Overall Predicted Payment 77% 38% 40% Income $50-$74,999 $50-$74,999 $30-$39,999 Age 40-44 50-54 40-44 Home Owner 100% own 100% own 76% own Gender 66% male 66% male 54% male Marital Status 62% married 68% married 68% married Length of Residence 3-13 years 5-18 years Under 12 years Face Value of Policy $140,000 $140,400 $132,700 Dev. Prem / Bonus Off 24% / 53% 13% / 63% 13% / 66% EFT / Credit Card 29% / 65% 0% / 0% 7% / 14% One may expect the most affluent group to be the best pay of all the clusters. However, there is a distinct different between the two most affluent groups. One pays well, the other does not. Key differences are found in age, offer, and payment method. These two clusters contain the records with the highest household income in the sample. Both tend to contain married men that own their homes seeking policies around $140,000. These commonalities aside, cluster 12, with the 40-44 year-olds, is twice as likely to pay their bill as cluster 10 with the 50-54 year-olds. THE POWER OF THE OFFER Cluster 1 Cluster 8 Overall Predicted Payment 72% 30% 40% Income $20-$29,999 $20-$29,999 $30-$39,999 Age 45-49 45-49 40-44 Home Owner 100% own 100% own 76% own Gender 63% male 59% male 54% male Marital Status 64% married 62% married 68% married Length of Residence 3-13 years 3-15 years Under 12 years Face Value of Policy $126,500 $128,800 $132,700 Dev. Prem / Bonus Off 19% / 61% 10% / 70% 13% / 66% EFT / Credit Card 34% / 58% 0% / 0% 7% / 14% Clusters 1 and 8 are nearly identical. But these two similar groups have a significant disparity in the kind of offer they responded to. The only difference between these two groups is that no one in cluster 8 volunteered a credit card or bank draft as a method of payment. The key difference, and, arguably, the only difference, is that those in cluster 8 simply do not trust the on-line channel for sensitive banking information. In addition, those in cluster 1, the better-pay group, are more likely to have come through a deviated premium offer than cluster 8, which tends towards a bonus offer. 8

OTHER LOW-PROBABILITY TO PAY CLUSTERS Cluster 6 Cluster 9 Cluster 11 Cluster 4 Predicted Payment 37% 31% 24% 26% Income 30-39,999 20-29,999 15-19,999 40-49,999 Age 65-69 55-59 30-34 45-49 Home Owner 100% own 100% renters 100% renters 100% renters Gender 59% male 46% male 37% male 46% Marital Status 66% married 38% married 22% married 49% married Length of 6+ years 1-9 years Under 5 years Under 5 years Residence Face Value of $126,500 $128,100 $134,200 $139,400 Policy Dev. Prem / Bonus 11% / 65% 11% / 72% 11% / 73% 10% / 76% Off EFT / Credit Card 1% / 0% 4% / 3% 2% / 0% 1% / 0% Cluster 6 is the oldest. Most of the file is 40-44; this cluster is 65-69. Predicted payment is 37% Cluster 9 and 11 are the poorest. Cluster 11 household income of $15,000-$19,999 is well below the $30,000 to $39,999 seen for most of the file. Cluster 11 has a predicted payment of 24%. Cluster 9 s household income is $20,000-$29,999. Cluster 9 has a predicted payment of 31%. Cluster 4 is anomalous in that it contains relatively affluent renters, who are slightly older than the sample mean, and yet is considered unlikely to pay. Besides being highly likely to request a bill, rather than using a credit card or EFT, cluster 4 is the most likely cluster to have come through a bonus offer rather than a deviated premium offer. The predicted payment of cluster 4 is 26%. CROSS-SELLING RESULTS Since multiple lists from a variety of providers were available, a cross-selling matrix was developed to identify commonalities among groups that investigate multiple providers. The greatest cross-over existed between the ADD policies (analyzed above) and the adult term-life product from the same carrier. Six clusters were produced to examine interactions among variables and interest in both products, or lack thereof. Only homeowners were interested in both ADD and Adult products. Women ages 40-44 were primarily interested in both products. 65% of these women were married. The only group of men interested in both products are ages 45-49, and 84% married. These two groups combined represent 82% of all the records interested in both products. 9

CONCLUSION By applying this segmentation scheme to the online portals, the carrier may identify the likelihood that the policy will be paid for. Since it is not based solely on payment type, the carrier may intelligently encourage those who most need it towards more immediate payment options. This model may be applied not only online, but may also be used in an offline re-contact strategy to ensure payment of policies. In addition, the cross-selling opportunities identified here may be used to increase the book of business. When crossselling, one may choose to approach only those in the most likely to pay clusters to create greater efficiencies. Although the EFT method of payment is preferred in the industry, one must realize that due to the public concern with identity theft it is quite difficult to get consumers to volunteer checking account routing numbers online. The industry is opposed to accepting credit cards, due primarily to the increased transaction cost, but credit cards are the universal currency of the online space. There are also quite a few opportunities for further analysis, such as the following: Include marketing costs and upstream marketing source data into the model to use cost-per-lead and costper-policy analysis to identify the most valuable media type Include profitability data to quantify earnings potential for switching policies to credit card payment after deducting card transaction fee Include attrition data to derive lifetime value for use in acquisition and retention marketing With longitudinal data a frequency model may be built to determine the proper timing of on-line and/or offline solicitations to cross-sell other products. ACKNOWLEDGEMENTS SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. indicates USA registration. Other brand and product names are registered trademarks or trademarks of their respective companies. CONTACT INFORMATION Your comments and questions are valued and encouraged. Contact the author at: Michael Sherlock Transcontinental Direct 75 Hawk Road Warminster, PA 18974 267.960.3161 mjsherlock@gmail.com 10