Distance to Event vs. Propensity of Event A Survival Analysis vs. Logistic Regression Approach



Similar documents
ATV - Lifetime Data Analysis

Modeling Customer Lifetime Value Using Survival Analysis An Application in the Telecommunications Industry

Statistics in Retail Finance. Chapter 6: Behavioural models

Predicting Customer Churn in the Telecommunications Industry An Application of Survival Analysis Modeling Using SAS

Predicting Customer Default Times using Survival Analysis Methods in SAS

Statistics for Biology and Health

Applying Survival Analysis Techniques to Loan Terminations for HUD s Reverse Mortgage Insurance Program - HECM

Overview Classes Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7)

A Comparison of Decision Tree and Logistic Regression Model Xianzhe Chen, North Dakota State University, Fargo, ND

Missing data and net survival analysis Bernard Rachet

Least Squares Estimation

Survival analysis methods in Insurance Applications in car insurance contracts

SAS and R calculations for cause specific hazard ratios in a competing risks analysis with time dependent covariates

LOGISTIC REGRESSION ANALYSIS

An Application of the Cox Proportional Hazards Model to the Construction of Objective Vintages for Credit in Financial Institutions, Using PROC PHREG

Data Mining: An Overview of Methods and Technologies for Increasing Profits in Direct Marketing. C. Olivia Rud, VP, Fleet Bank

Modeling Lifetime Value in the Insurance Industry

Sun Li Centre for Academic Computing

Tips for surviving the analysis of survival data. Philip Twumasi-Ankrah, PhD

Nominal and ordinal logistic regression

STATISTICA Formula Guide: Logistic Regression. Table of Contents

SUMAN DUVVURU STAT 567 PROJECT REPORT

PROC LOGISTIC: Traps for the unwary Peter L. Flom, Independent statistical consultant, New York, NY

Logistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression

Checking proportionality for Cox s regression model

Using An Ordered Logistic Regression Model with SAS Vartanian: SW 541

Duration Analysis. Econometric Analysis. Dr. Keshab Bhattarai. April 4, Hull Univ. Business School

Regression Modeling Strategies

Introduction to Event History Analysis DUSTIN BROWN POPULATION RESEARCH CENTER

Adequacy of Biomath. Models. Empirical Modeling Tools. Bayesian Modeling. Model Uncertainty / Selection

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

An Introduction to Survival Analysis

VI. Introduction to Logistic Regression

Statistical Machine Learning

SPSS TRAINING SESSION 3 ADVANCED TOPICS (PASW STATISTICS 17.0) Sun Li Centre for Academic Computing lsun@smu.edu.sg

Improve Marketing Campaign ROI using Uplift Modeling. Ryan Zhao

11. Analysis of Case-control Studies Logistic Regression

Pattern Analysis. Logistic Regression. 12. Mai Joachim Hornegger. Chair of Pattern Recognition Erlangen University

Introduction to Fixed Effects Methods

Marketing Information System in Fitness Clubs - Data Mining Approach -

Multivariate Logistic Regression

7.1 The Hazard and Survival Functions

Alex Vidras, David Tysinger. Merkle Inc.

Valuing double barrier options with time-dependent parameters by Fourier series expansion

Revenue s Business Context

Unit 12 Logistic Regression Supplementary Chapter 14 in IPS On CD (Chap 16, 5th ed.)

Linda K. Muthén Bengt Muthén. Copyright 2008 Muthén & Muthén Table Of Contents

SVM-Based Approaches for Predictive Modeling of Survival Data

Attrition in Online and Campus Degree Programs

Handling attrition and non-response in longitudinal data

Statistics and Data Analysis

Tests for Two Survival Curves Using Cox s Proportional Hazards Model

The equivalence of logistic regression and maximum entropy models

Modeling the Claim Duration of Income Protection Insurance Policyholders Using Parametric Mixture Models

Credit Risk Analysis Using Logistic Regression Modeling

Regression Models for Ordinal Responses: A Review of Methods and Applications

Interpretation of Somers D under four simple models

A LONGITUDINAL AND SURVIVAL MODEL WITH HEALTH CARE USAGE FOR INSURED ELDERLY. Workshop

MARKET SEGMENTATION, CUSTOMER LIFETIME VALUE, AND CUSTOMER ATTRITION IN HEALTH INSURANCE: A SINGLE ENDEAVOR THROUGH DATA MINING

Survey, Statistics and Psychometrics Core Research Facility University of Nebraska-Lincoln. Log-Rank Test for More Than Two Groups

Name of the module: Multivariate biostatistics and SPSS Number of module:

Vignette for survrm2 package: Comparing two survival curves using the restricted mean survival time

Support Vector Machines Explained

How To Find Out If A College Degree Is More Successful

Service courses for graduate students in degree programs other than the MS or PhD programs in Biostatistics.

Applications of R Software in Bayesian Data Analysis

Statistical Analysis of Life Insurance Policy Termination and Survivorship

Life Settlement Pricing

How To Make A Credit Risk Model For A Bank Account

Get Better Business Results

Logistic regression modeling the probability of success

SAS Software to Fit the Generalized Linear Model

CREDIT SCORING MODEL APPLICATIONS:

Determining Future Success of College Students

A random point process model for the score in sport matches

Statistics in Retail Finance. Chapter 7: Fraud Detection in Retail Credit

Advanced Quantitative Methods for Health Care Professionals PUBH 742 Spring 2015

Calculating Effect-Sizes

Rank dependent expected utility theory explains the St. Petersburg paradox

Distribution (Weibull) Fitting

Package smoothhr. November 9, 2015

Statistics in Retail Finance. Chapter 2: Statistical models of default

Introduction. Survival Analysis. Censoring. Plan of Talk

Business Analytics and Credit Scoring

Introduction to Survival Analysis

Transcription:

Distance to Event vs. Propensity of Event A Survival Analysis vs. Logistic Regression Approach Abhijit Kanjilal Fractal Analytics Ltd. Abstract: In the analytics industry today, logistic regression is a very robust and proven technique to predict the propensity of certain events. As a result, it has been very popular in marketing analytics. Given the same marketing problems, a survival analysis tries to answer them slightly differently it predicts a (time) distance to an event, instead of the propensity of event. With an objective to access the predictability, robustness, and strength of survival analysis approach as compared to logistic regression, we carried out an analysis in the context of an Activation Management Program. The results clearly show that one survival analysis model is as good as multiple logistic regressions for different prediction window, in terms of predictability and strength. When we talk about marketing activity in a customer s lifetime, it probably starts with activation and ends with attrition. These events basically constitute the (obviously closed) boundaries of the premise of marketing. Hence effective marketing around these events is extremely crucial for organizations to drive a significant profitability. This demands the prediction of such events and definition of strategies based on a customer s inclination towards these. The entire challenge becomes measuring a customer s inclination towards marketing events. This can be done in two ways, a) through a Propensity of Event approach or b) through a Distance to Event approach. When we are talking about the propensity of the event approach, it basically scores a probability of the event for each customer. This approach inherently assumes a prediction window in which customers will be assigned a probability of the event; for example, the activation probability of newly acquired customers in the next 12 months. Given this example, we generally would love to build a logistic model. Now, think about the customers who are likely to activate within first six months of on book and customers who are likely to activate within 1-12th month on book. The profiles of these two sets may differ, hence prediction from P6 (activation probability in 6 months) and P12 (activation probability within 12 months) may not be comparable. To understand the profiles of these two sets we basically need to build two models, one with prediction window of six months, and another of 12 months. If we divide the window further, we end up building more models. Do we have any technique so that we get the solution at a go? The answer is survival analysis. Medical science has been using this technique extensively. We should keep in mind that medical science is very sensitive towards errors in prediction. If marketing is also very sensitive towards errors in predicting right customers at right time with right offers, then it makes sense to explore the technique. Does it add any value to the process? This gives us the motivation to access survival analysis as compared to logistic regression in the context of Activation Management Program. Theoretical Model: Logistic regression is a model used for prediction of the probability of occurrence of an event. In this case our target variable is of binary type e.g. attrited or not, or card activated or not. It makes use of some predictive variables, either numerical or categorical, that contain all available information about the objects. 23

( ) pi logit(pi) = ln = β0 + β1x1,i +... + βk xk, 1 pi ( ) pi where logit(pi) p i is pi = the = probability ln of the event = β0 + β1x1,i for the i th + object... + βk xk, 1 and x s are pi the risk factors (β0 + β1x1,i +... + βk xk,i). 1+ affecting e the event. s are the parameters of the model. Hence the probability boils down to pi = 1+ e 1 (β0 + β1x1,i +... + βk xk,i). In logistic regression, we are interested in studying how risk factors are associated with the target event. Sometimes, though, we are interested in how a risk factor affects time to event. Survival analysis is used to analyze data in which the time until the event is of interest. The response is often referred to as a failure time, survival time, or event time. The beauty of survival analysis is that it can tackle the problem of censored data. Censoring is present when we have some information about a subject s event time, but we don t know the exact event time. There are several survival analysis techniques, Cox Proportional Hazard model is the most popular one. In this research we only focused on this technique. Let T D denote the time of the event. Our data, based on a sample size n, consists of a triple Empirical Model: Having these two techniques in place, we tried to address Activation Management Program for a bank credit card. Our objective was to access customers inclination towards activation within six months after acquisition and based on the results to prioritize promotional strategy and speed up the activation process of customers from their natural pace. Customers application information was used to build models. We divided the six months prediction window in three parts: 0-2 months, 0-4 months, and 0-6 months, and built three logistics regression models to predict two-, four-, and six-month activation probability. On the other hand we built one survival analysis and computed two-, four-, and six-month activation probability. To construct this solution, we first fit a proportional hazard model to the data and obtain the partial likelihood estimators. Let t1<t2<...<td be the distinct distance to activation and di be the no of active customers at. Let The estimator of the cumulative baseline activation rate is given by The estimator of the baseline dormancy function, Where T j is the time on study for the j th customer, is the event indicator, and is the vector of risk factor affecting the distance to event. Now, where is an arbitrary baseline hazard rate, is the parameter vector and is a known function. Because must be positive, a common model for is is given by The dormancy probability of a new customer with a given set of Risk Factors X 0 will be Activation Probability = 1 Dormancy Probability The model estimates will be arrived through the following partial likelihood function maximizaiton Activation Probability 24

Activation probability will be a monotonically increasing function in time. Here is the beauty of survival model, to get some kind of time function in logistics regression set up, we would need to build several models. The results of the analysis have been shown below one survival model vs. multiple logistic models (Results are based on actual numbers) 25

26

27

28

Conclusion: This case study shows that survival analysis model does as good a job as multiple logistic regression models for a different window in terms of strength and predictability. Survival Analysis differentiates individuals in terms of distance to event, and marketing strategies can be prioritized for optimization to drive higher profitability. The logistic regression model is unable to address the same business problem in similar manner. It should be noted that the results are based on a limited range of geography and product data. This case-study does not include timedependent risk factors into consideration. It would be useful to further generalize them in future studies. Reference: 1. Survival Analysis by John P. Klein, Melvin L. Moeschberger, Springer, 2005 2. Survival Analysis Using SAS: A Practical Guide by Paul D. Allison, SAS, March 2004 3. Modeling Survival Data: Extending the Cox Model (Statistics for Biology and Health) by Terry M. Therneau and Patricia M. Grambsch, Springer, 2005 4. Applied Survival Analysis: Regression Modeling of Time to Event Data by David W. Hosmer, Jr. and Stanley Lemeshow, Jhon Wiley & Sons, 1999 30