MARKET SEGMENTATION, CUSTOMER LIFETIME VALUE, AND CUSTOMER ATTRITION IN HEALTH INSURANCE: A SINGLE ENDEAVOR THROUGH DATA MINING



Similar documents
Data Mining with SAS. Mathias Lanner Copyright 2010 SAS Institute Inc. All rights reserved.

Easily Identify Your Best Customers

A Property & Casualty Insurance Predictive Modeling Process in SAS

DIGITS CENTER FOR DIGITAL INNOVATION, TECHNOLOGY, AND STRATEGY THOUGHT LEADERSHIP FOR THE DIGITAL AGE

How Organisations Are Using Data Mining Techniques To Gain a Competitive Advantage John Spooner SAS UK

Data Analytical Framework for Customer Centric Solutions

Data Mining is sometimes referred to as KDD and DM and KDD tend to be used as synonyms

A Marketer s Guide to Analytics

not possible or was possible at a high cost for collecting the data.

Master of Science in Marketing Analytics (MSMA)

Modeling Customer Lifetime Value Using Survival Analysis An Application in the Telecommunications Industry

Nine Common Types of Data Mining Techniques Used in Predictive Analytics

9. 3 CUSTOMER RELATIONSHIP MANAGEMENT SYSTEMS

CoolaData Predictive Analytics

TNS EX A MINE BehaviourForecast Predictive Analytics for CRM. TNS Infratest Applied Marketing Science

Framing Business Problems as Data Mining Problems

The Power of Personalizing the Customer Experience

The primary goal of this thesis was to understand how the spatial dependence of

Regression Modeling Strategies

Predictive Modeling Techniques in Insurance

A Property and Casualty Insurance Predictive Modeling Process in SAS

Data Mining + Business Intelligence. Integration, Design and Implementation

Customer Care for High Value Customers:

Predicting & Preventing Banking Customer Churn by Unlocking Big Data

BIG DATA ANALYTICS. in Insurance. How Big Data is Transforming Property and Casualty Insurance

Analytical CRM solution for Banking industry

Predicting Churn. A SAS White Paper

Predicting & Preventing Banking Customer Churn by Unlocking Big Data

Five predictive imperatives for maximizing customer value

Predicting Customer Churn in the Telecommunications Industry An Application of Survival Analysis Modeling Using SAS

Customer Relationship Management

Data Mining: Motivations and Concepts

Variable Selection in the Credit Card Industry Moez Hababou, Alec Y. Cheng, and Ray Falk, Royal Bank of Scotland, Bridgeport, CT

Banking Analytics Training Program

Customer Churn Identifying Model Based on Dual Customer Value Gap

Predictive Analytics: Extracts from Red Olive foundational course

A Hybrid Modeling Platform to meet Basel II Requirements in Banking Jeffery Morrision, SunTrust Bank, Inc.

The Science and Art of Market Segmentation Using PROC FASTCLUS Mark E. Thompson, Forefront Economics Inc, Beaverton, Oregon

Service courses for graduate students in degree programs other than the MS or PhD programs in Biostatistics.

Role of Social Networking in Marketing using Data Mining

Get Better Business Results

Improving the Performance of Data Mining Models with Data Preparation Using SAS Enterprise Miner Ricardo Galante, SAS Institute Brasil, São Paulo, SP

A Basic Guide to Modeling Techniques for All Direct Marketing Challenges

Next Best Action Using SAS

Using reporting and data mining techniques to improve knowledge of subscribers; applications to customer profiling and fraud management

Business Process Services. White Paper. Predictive Analytics in HR: A Primer

Potential Value of Data Mining for Customer Relationship Marketing in the Banking Industry

IT and CRM A basic CRM model Data source & gathering system Database system Data warehouse Information delivery system Information users

Statistics 215b 11/20/03 D.R. Brillinger. A field in search of a definition a vague concept

Segmentation for High Performance Marketers

An Introduction to Survival Analysis

White Paper. Segmentation in the Healthcare Insurance Industry

CRM Analytics for Telecommunications

Survival Analysis of the Patients Diagnosed with Non-Small Cell Lung Cancer Using SAS Enterprise Miner 13.1

Digging for Gold: Business Usage for Data Mining Kim Foster, CoreTech Consulting Group, Inc., King of Prussia, PA

In recent years, many companies have embraced CRM tools and

THE THREE "Rs" OF PREDICTIVE ANALYTICS

How2Guide. How Marketers Can Tap into Customer Data to Improve Customer Profitability and Campaign Effectiveness

Analyzing Customer Behavior using Data Mining Techniques: Optimizing Relationships with Customer

Data Mining Solutions for the Business Environment

Churn Management - The Colour of Money (*)

Data Mining: Overview. What is Data Mining?

Data Mining for Fun and Profit

Business Analytics Using SAS Enterprise Guide and SAS Enterprise Miner A Beginner s Guide

CHURN PREDICTION IN MOBILE TELECOM SYSTEM USING DATA MINING TECHNIQUES

Predictive Modeling and Big Data

M15_BERE8380_12_SE_C15.7.qxd 2/21/11 3:59 PM Page Analytics and Data Mining 1

Applied Data Mining Analysis: A Step-by-Step Introduction Using Real-World Data Sets

Measuring success on Facebook


Life Insurance is a Contract between an Insured and an insurer where

2015 Workshops for Professors

UNIVERSITY OF GHANA (All rights reserved) UGBS SECOND SEMESTER EXAMINATIONS: 2013/2014. BSc, MAY 2014

Statistics in Retail Finance. Chapter 6: Behavioural models

Data Mining for Business Analytics

SAP Thought Leadership SAP Customer Relationship Management. Strengthen the Brand and Improve

Adobe Analytics Premium Customer 360

Analytics: A Powerful Tool for the Life Insurance Industry

Getting Behind The Customer Experience Wheel

Driving Insurance World through Science Murli D. Buluswar Chief Science Officer

Send hyper-personalized s based on revolutionary predictive algorithms and increase revenues by 30%.

Healthcare Measurement Analysis Using Data mining Techniques

Customer retention. Case study. Executive summary. General issue

Five Predictive Imperatives for Maximizing Customer Value

Get to Know the IBM SPSS Product Portfolio

IBM SPSS Direct Marketing 23

Using survival analytics to estimate lifetime value

White Paper. Data Mining for Business

Transcription:

MARKET SEGMENTATION, CUSTOMER LIFETIME VALUE, AND CUSTOMER ATTRITION IN HEALTH INSURANCE: A SINGLE ENDEAVOR THROUGH DATA MINING Illya Mowerman WellPoint, Inc. 370 Bassett Road North Haven, CT 06473 (203) 654 3188 Illya.mowerman@wellpoint.com ABSTRACT In today s businesses world with finite financial resources and evermore competition, firms must focus their efforts of both acquiring new customers and retaining existing ones that produce the highest profits. Therefore, profits as well as acquisition and retention costs must be viewed on a long term basis, this meaning over the life of the customer with the firm. This paper proposes a methodology for calculating the lifetime value of a customer in the health care insurance industry that renders a customer segmentation based on their expected life with the company and their profitability with the aid of survival analysis. INTRODUCTION Theophrastus, the immediate successor of Aristotle, was among other things a student of character. His pioneering character sketches captured the profound insight that an individual s behavior across seemingly unrelated domains is often highly correlated [4]. It is not uncommon for a person to ask what kind of a person is he? in order to attempt to predict a future behavior or reaction. Likewise in many businesses, marketing campaigns are built on studies that capture measures of loyalty, specifically in loyalty of profitable customers. It is by now well documented that individuals exhibit consistent behavioral tendencies across a range of contexts. The associations are wide ranging and sometimes surprising. Recently, psychologists have found that there are strong linkages between personality measures and how a person walks, how often they smile, what kind of music they like, and how they dress [4]. For businesses, the interest would be to be able to identify groups that are loyal to their brand. In the health care insurance industry there is a fixation on targeting healthy customers rather than loyal ones. What is meant by healthy is not only the customers current state of health,

but also their future health. The ability to predict future health costs has been well studied and is taken into account within the underwriting department. The customer is then given a premium commensurate to their risk, and in some cases denied coverage. The ability to predict which customers will incur in very high costs, usually due to rare illnesses or physical accidents, is, for the most part, not possible unless with the use of genetic testing or a crystal ball, which is not an option for now. Therefore, the fixation on targeting customers that will never become train wrecks is an exercise in futility. Furthermore, given the nature of large numbers, if an insurance product is priced correctly, commensurate with the benefits offered, when a product has enough customers, referred to as members in the health insurance industry, the overall profitability will be positive. In the health insurance industry target marketing and customer retention campaigns should be done on the basis of loyalty, profitability, as well as other dimensions. The need then arises to group existing and potential members into like groups in order to develop more responsive campaigns. Traditionally, when customer segmentation is mentioned one immediately thinks of cluster analysis, grouping observations with similar traits. The limitation of this approach in the health insurance industry is that there are time dependent covariates the affect loyalty which cannot be easily included into cluster analysis if not by a myriad of indicator variables. Sequentially after creating a segmentation the market researcher may formulate the next analysis with a binary outcome, churn and no churn, in order to predict who is most likely to churn by pair of segments found in the cluster analysis. The need of modeling at least pairs of segments is due to that within segments members have very similar durations. The complexity now is which pairs of segments to model together to produce meaningful contrasts. Conversely the market researcher may decide to model churn without talking into account a segmentation by using all the observations, but then she is left without taking into account the time dependent variables once again. For this analysis logistic regression, neural networks, as well as other algorithms can be employed. The health insurance industry has special considerations when compared to other industries. In this industry, like other insurance industries, premiums are paid monthly, and customer benefits are irregular and sparse for the most part. Other insurance companies, such as automotive and home insurance, are mandatory, either by the State or by the bank that issued the loan for the asset, while health insurance in most states is voluntary. When the members pay their premium they receive in return piece of mind. It is only when they receive a health care service or products do members actually receive a tangible benefit from their premium. The implication of the difference between the value proposition of health insurance companies and other businesses creates the need for special considerations. What is meant by this is that there are dimensions and metrics within the industry that are specific to the industry.

In this paper a framework for creating a customer market segmentation, and calculating lifetime value for the health insurance industry. Next is the literature review where the methodology was derived. Then the proposed methodology with a detailed account of the metrics required is presented. Last, the conclusion. LITERATURE REVIEW Calculating lifetime value, as well as creating a segmentation based on profitability and duration can be viewed as both a dichotomous outcome, churn and non churn, and a time series problem. Survival analysis does both as it calculates time to event. Nonetheless, traditional survival analysis models are not the well suited for this research. Conventional survival models were developed for small data sets from designed experiments where the purpose of the analysis is to guide scientifically sound conclusions. These methods are often awkward for large databases where the purpose is to guide profitable business decisions [6]. The algorithm to be used is the Discrete Time Logistic Hazard Model, which was first introduced by C. Brown in 1975. Hazard models based on logistic regression are well suited to the challenging features of survival data mining problems such as: discrete time, dependent competing risks, truncated data, time dependent covariates, time varying effects of the covariates, and irregular non linear hazards [6]. Traditional survival algorithms cannot deal with all the above mentioned conditions together. Hazard models based on logistic regression originated in the field of Biostatistics [3], but have been rarely used in medical applications. However, it is better established in the field of social sciences [1], [2]. METHODOLOGY This is a data mining endeavor, and established data mining steps are applied: define the research question, prepare and explore the data, apply data mining algorithms, interpret and analyze results, disseminate knowledge [5]. The research question is: who are our most loyal customers, who are our least loyal, and break them out by gross margin? The data used are transactional data on claims and premiums, and demographic data from the health insurance company, psychographic and financial data from a vendor of this type of data. Following is the methodology for the data preparation in order to successfully build the model that will answer the research question. The considerations that need to be taken into account to successfully apply the data mining algorithm evidently will determine the preparation for the data. The contribution of this paper lies in the preparation of the time dependent covariates and the introduction and proposed use of regression splines.

The algorithm requires multiple observations per subject, in the case the insurance policy subscriber, one for each discrete time interval. For this analysis the time interval will be one month, because the industry functions, in many ways, on a monthly basis. Premiums, although paid in different intervals at the choice of the subscriber, are calculated monthly along with other metrics related to claims, which are calculated per member per month (PMPM). The question now is the statistic of the time dependent covariates. Age of the subscriber is evident to be the actual age at the time interval, but the statistic of other covariates is not so easily discerned. Gross margin on the other hand should be a cumulative statistic reported on a PMPM basis, which allows for the normalization of multiple members in a policy, eliminates the confounding effect of time, and allows for a broader understanding of the policies financial health. Premiums that are changed, excluding when the product is changed or the member count within the policy changes, change on a 12 month anniversary cycle. Exploratory data analysis has shown that the impact of rate changes lasts for three months after the rate change. Therefore, a field indicating rate change representing the nominal dollar value of the rate change will remain the same for three time intervals, starting from the first month the rate hike is in effect. This will allow the model to capture churn, also referred to as lapses in the health insurance industry, due to rate increases. Another field, similar in nature, could be calculated as the percent of the rate hike. Nonetheless, it is obvious that these two variables are confounding, and that one of them should ultimately be taken out. Product changes signal a change in perceived need of the subscriber. A down grade in product may signal either a realization of the subscriber that his product is too rich in benefits, or perhaps a downturn in his income. Conversely, an upgrade in product may signal either a perceived future need of more benefits, or an increase in the subscriber s income. In either case, a change in product signals an engagement with the firm to modify their contract with it with the ultimate goal of improving their perceived value from the health insurance company. Therefore, a cumulative variable is to be created that counts the number of product changes. It will also be useful to create an additional variable that would indicate if the change was an upgrade or downgrade. Policies with no claims have been found to be more likely to lapse than policies with at least one claim. This finding was encountered in the exploratory data analysis. The variable that indicates whether a policy has had no claims is cumulative in the sense that a policy will have the variable set to true until the month when the first claim is encountered and false from then on.

Regression splines are segmented functions composed of polynomials. The join between the segments are called knots. A regression spline suitable for hazard functions is composed of several cubic segments and a linear end segment joined smoothly to each other. The function can be parameterized as a linear combination of time and a set of cubic spline basis functions. Several cubic splines are introduced into the model as covariates at different time intervals equal spaced. In example, at every three months a spline is inserted into the model. When the algorithm is run, the selection method will determine the significant splines. The splines that were found significant are then used to segment the population. For example, if the splines at months six, fifteen, and twenty four were found significant we then would interpret these results as four macro segments: policies that last up to six months, policies that lapse between seven and fifteen months, policies that term between sixteen and twenty four months, and those that last more than twenty five months. We cannot conclude that those policies that last more than twenty five months do not term because of the right censoring of the data. With the model built and the macro segments defined, the lifetime value of the policy can be calculated. The calculation of the lifetime value, which is well documented, is the net present value of the future returns. DISCUSSION In this paper a single data mining endeavor is proposed to satisfy the goal of segmenting and profiling a customer base of a health insurance company with the aim of understanding churn behavior, and ultimately long term profitability. A novel approach to segmenting is presented with the interpretation of splines that are found significant. Last, a discussion of covariates and their proper statistics, specific to the health insurance industry, were presented. The methodology proposed here is related specially to the health insurance industry. However, this does not limit the methodology proposed to the industry. Specifically, the use of the splines to derive segments based on the churn behavior is applicable to many industries unrelated to health insurance and insurance in general. REFERENCES [1] Allison, P. D. Discrete Time Methods for the Analysis of Event Histories. Sociological Methodology, 1982, Jossey Bass. [2] Allison, P. D. Survival Analysis Using the SAS System. SAS Institute, Inc., 1995. [3] Brown, C. C. On the Use of Indicator Variables for Studying the Time Dependence of Parameters in a Response Time Model. Biometrics, 1975, Vol. 31, 862 872.

[4] Gosling, S. Snoop: What Your Stuff Says about You. New York: Basic Books, 2008. [5] Hand, D. J., Mannila, H., Smyth, P. Principles of Data Mining. The MIT Press, 2001. [6] Potts, W. Survival Data Mining: Predictive Hazard Modeling for Customer History Data. SAS Institute, Inc., 2004.