Churn prediction in subscription services: An application of support vector machines while comparing two parameter-selection techniques



Similar documents
Forecasting the Direction and Strength of Stock Market Movement

Logistic Regression. Lecture 4: More classifiers and classes. Logistic regression. Adaboost. Optimization. Multiple class classification

CS 2750 Machine Learning. Lecture 3. Density estimation. CS 2750 Machine Learning. Announcements

The Development of Web Log Mining Based on Improve-K-Means Clustering Analysis

What is Candidate Sampling

Support Vector Machines

An Alternative Way to Measure Private Equity Performance

Feature selection for intrusion detection. Slobodan Petrović NISlab, Gjøvik University College

Institute of Informatics, Faculty of Business and Management, Brno University of Technology,Czech Republic

The Greedy Method. Introduction. 0/1 Knapsack Problem

The OC Curve of Attribute Acceptance Plans

Single and multiple stage classifiers implementing logistic discrimination

L10: Linear discriminants analysis

Forecasting the Demand of Emergency Supplies: Based on the CBR Theory and BP Neural Network

benefit is 2, paid if the policyholder dies within the year, and probability of death within the year is ).

CHAPTER 14 MORE ABOUT REGRESSION

On the Optimal Control of a Cascade of Hydro-Electric Power Stations

Performance Analysis and Coding Strategy of ECOC SVMs

Statistical Methods to Develop Rating Models

Causal, Explanatory Forecasting. Analysis. Regression Analysis. Simple Linear Regression. Which is Independent? Forecasting

Proceedings of the Annual Meeting of the American Statistical Association, August 5-9, 2001

An Evaluation of the Extended Logistic, Simple Logistic, and Gompertz Models for Forecasting Short Lifecycle Products and Services

1. Measuring association using correlation and regression

Improved SVM in Cloud Computing Information Mining

Discussion Papers. Support Vector Machines (SVM) as a Technique for Solvency Analysis. Laura Auria Rouslan A. Moro. Berlin, August 2008

Module 2 LOSSLESS IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

Bayesian Network Based Causal Relationship Identification and Funding Success Prediction in P2P Lending

SVM Tutorial: Classification, Regression, and Ranking

Can Auto Liability Insurance Purchases Signal Risk Attitude?

1 Example 1: Axis-aligned rectangles

Lecture 2: Single Layer Perceptrons Kevin Swingler

Logistic Regression. Steve Kroon

A hybrid global optimization algorithm based on parallel chaos optimization and outlook algorithm

Support vector domain description

Face Verification Problem. Face Recognition Problem. Application: Access Control. Biometric Authentication. Face Verification (1:1 matching)

Transition Matrix Models of Consumer Credit Ratings

STATISTICAL DATA ANALYSIS IN EXCEL

The Current Employment Statistics (CES) survey,

SPEE Recommended Evaluation Practice #6 Definition of Decline Curve Parameters Background:

How To Understand The Results Of The German Meris Cloud And Water Vapour Product

Answer: A). There is a flatter IS curve in the high MPC economy. Original LM LM after increase in M. IS curve for low MPC economy

Study on Model of Risks Assessment of Standard Operation in Rural Power Network

Credit Limit Optimization (CLO) for Credit Cards

Vision Mouse. Saurabh Sarkar a* University of Cincinnati, Cincinnati, USA ABSTRACT 1. INTRODUCTION

IMPACT ANALYSIS OF A CELLULAR PHONE

An Interest-Oriented Network Evolution Mechanism for Online Communities

Robust Design of Public Storage Warehouses. Yeming (Yale) Gong EMLYON Business School

An Empirical Study of Search Engine Advertising Effectiveness

THE METHOD OF LEAST SQUARES THE METHOD OF LEAST SQUARES

Traffic-light a stress test for life insurance provisions

Gender Classification for Real-Time Audience Analysis System

The Use of Analytics for Claim Fraud Detection Roosevelt C. Mosley, Jr., FCAS, MAAA Nick Kucera Pinnacle Actuarial Resources Inc.

SIMPLE LINEAR CORRELATION

8.5 UNITARY AND HERMITIAN MATRICES. The conjugate transpose of a complex matrix A, denoted by A*, is given by

CHAPTER 5 RELATIONSHIPS BETWEEN QUANTITATIVE VARIABLES

A Novel Methodology of Working Capital Management for Large. Public Constructions by Using Fuzzy S-curve Regression

A study on the ability of Support Vector Regression and Neural Networks to Forecast Basic Time Series Patterns

A PROBABILITY-MAPPING ALGORITHM FOR CALIBRATING THE POSTERIOR PROBABILITIES: A DIRECT MARKETING APPLICATION

Risk-based Fatigue Estimate of Deep Water Risers -- Course Project for EM388F: Fracture Mechanics, Spring 2008

Exhaustive Regression. An Exploration of Regression-Based Data Mining Techniques Using Super Computation

VRT012 User s guide V0.1. Address: Žirmūnų g. 27, Vilnius LT-09105, Phone: (370-5) , Fax: (370-5) , info@teltonika.

Number of Levels Cumulative Annual operating Income per year construction costs costs ($) ($) ($) 1 600,000 35, , ,200,000 60, ,000

IDENTIFICATION AND CORRECTION OF A COMMON ERROR IN GENERAL ANNUITY CALCULATIONS

Customer Lifetime Value Modeling and Its Use for Customer Retention Planning

DEFINING %COMPLETE IN MICROSOFT PROJECT

Dynamic Pricing for Smart Grid with Reinforcement Learning

Section 5.4 Annuities, Present Value, and Amortization

ECE544NA Final Project: Robust Machine Learning Hardware via Classifier Ensemble

CHOLESTEROL REFERENCE METHOD LABORATORY NETWORK. Sample Stability Protocol

GRAVITY DATA VALIDATION AND OUTLIER DETECTION USING L 1 -NORM

A Genetic Programming Based Stock Price Predictor together with Mean-Variance Based Sell/Buy Actions

How Sets of Coherent Probabilities May Serve as Models for Degrees of Incoherence

Latent Class Regression. Statistics for Psychosocial Research II: Structural Models December 4 and 6, 2006

Calculating the high frequency transmission line parameters of power cables

AN APPOINTMENT ORDER OUTPATIENT SCHEDULING SYSTEM THAT IMPROVES OUTPATIENT EXPERIENCE

Method for assessment of companies' credit rating (AJPES S.BON model) Short description of the methodology

Knowledge Discovery in a Direct Marketing Case using Least Squares Support Vector Machines

Staff Paper. Farm Savings Accounts: Examining Income Variability, Eligibility, and Benefits. Brent Gloy, Eddy LaDue, and Charles Cuykendall

Linear Circuits Analysis. Superposition, Thevenin /Norton Equivalent circuits

On-Line Fault Detection in Wind Turbine Transmission System using Adaptive Filter and Robust Statistical Features

Hollinger Canadian Publishing Holdings Co. ( HCPH ) proceeding under the Companies Creditors Arrangement Act ( CCAA )

Data Mining Analysis and Modeling for Marketing Based on Attributes of Customer Relationship

14.74 Lecture 5: Health (2)

How To Know The Components Of Mean Squared Error Of Herarchcal Estmator S

Lecture 5,6 Linear Methods for Classification. Summary

Calculation of Sampling Weights

J. Parallel Distrib. Comput.

Gender differences in revealed risk taking: evidence from mutual fund investors

THE APPLICATION OF DATA MINING TECHNIQUES AND MULTIPLE CLASSIFIERS TO MARKETING DECISION

HOUSEHOLDS DEBT BURDEN: AN ANALYSIS BASED ON MICROECONOMIC DATA*

Formulating & Solving Integer Problems Chapter

Estimating Total Claim Size in the Auto Insurance Industry: a Comparison between Tweedie and Zero-Adjusted Inverse Gaussian Distribution

Transcription:

Expert Systems wth Applcatons Expert Systems wth Applcatons 34 (2008) 313 327 www.elsever.com/locate/eswa Churn predcton n subscrpton servces: An applcaton of support vector machnes whle comparng two parameter-selecton technques Krstof Coussement, Drk Van den Poel * Ghent Unversty, Faculty of Economcs and Busness Admnstraton, Department of Marketng, Tweekerkenstraat 2, 9000 Ghent, Belgum Abstract CRM gans ncreasng mportance due to ntensve competton and saturated markets. Wth the purpose of retanng customers, academcs as well as practtoners fnd t crucal to buld a churn predcton model that s as accurate as possble. Ths study apples support vector machnes n a newspaper subscrpton context n order to construct a churn model wth a hgher predctve performance. Moreover, a comparson s made between two parameter-selecton technques, needed to mplement support vector machnes. Both technques are based on grd search and cross-valdaton. Afterwards, the predctve performance of both knds of support vector machne models s benchmarked to logstc regresson and random forests. Our study shows that support vector machnes show good generalzaton performance when appled to nosy marketng data. Nevertheless, the parameter optmzaton procedure plays an mportant role n the predctve performance. We show that only when the optmal parameter-selecton procedure s appled, support vector machnes outperform tradtonal logstc regresson, whereas random forests outperform both knds of support vector machnes. As a substantve contrbuton, an overvew of the most mportant churn drvers s gven. Unlke ample research, monetary value and frequency do not play an mportant role n explanng churn n ths subscrpton-servces applcaton. Even though most mportant churn predctors belong to the category of varables descrbng the subscrpton, the nfluence of several clent/company-nteracton varables cannot be neglected. Ó 2006 Elsever Ltd. All rghts reserved. Keywords: Data mnng; Churn predcton; Subscrpton servces; Support vector machnes; Parameter-selecton technque 1. Introducton * Correspondng author. Tel.: +32 9 264 89 80; fax: +32 9 264 42 79. E-mal addresses: Krstof.Coussement@UGent.be (K. Coussement), Drk.VandenPoel@UGent.be (D. Van den Poel). Nowadays, more and more companes start to focus on Customer Relatonshp Management, CRM. Indeed due to saturated markets and ntensve competton, a lot of companes do realze that ther exstng database s ther most valuable asset (Athanassopoulos, 2000; Jones, Mothersbaugh, & Beatty, 2000; Thomas, 2001). Ths trend s also notable n subscrpton servces. Companes start to shft away from ther tradtonal, mass marketng strateges, n favor of targeted marketng actons (Burez & Van den Poel, forthcomng). It s more proftable to keep and satsfy exstng customers than to constantly attract new customers who are characterzed by a hgh attrton rate (Renartz & Kumar, 2003). The dea of dentfyng those customers most prone to swtchng carres a hgh prorty (Keaveney & Parthasarathy, 2001). It has been shown that a small change n retenton rate can result n sgnfcant changes n contrbuton (Van den Poel & Larvère, 2004). In order to effectvely manage customer churn wthn a company, t s crucal to buld an effectve and accurate customer-churn model. To accomplsh ths, there are numerous predctvemodelng technques avalable. These data-mnng technques can effectvely assst wth the selecton of customers most prone to churn (Hung, Yen, & Wang, 2006). These technques vary n terms of statstcal technque (e.g., neural nets versus logstc regresson), varable-selecton method (e.g., theory versus stepwse selecton), number of 0957-4174/$ - see front matter Ó 2006 Elsever Ltd. All rghts reserved. do:10.1016/j.eswa.2006.09.038

314 K. Coussement, D. Van den Poel / Expert Systems wth Applcatons 34 (2008) 313 327 varables ncluded n the model, tme spent to buld the fnal model, as well as n terms of allocatng the tme across the dfferent tasks n the modelng process (Nesln, Gupta, Kamakura, Lu, & Mason, 2004). Ths study contrbutes to the exstng lterature by nvestgatng the effectveness of the support vector machnes (SVMs) approach n detectng customer churn n subscrpton servces. Ample research focuses on predctng customer churn n dfferent ndustres, ncludng nvestment products, nsurance, electrc utltes, health care provders, credt card provders, bankng, nternet servce provders, telephone servce provders, onlne servces,... Although SVMs have shown excellent generalzaton performance n a wde range of areas lke bonformatcs (Chen, Harrson, & Zhang, 2005; He, Hu, & Harrson, 2005; Zhong, He, Harrson, Ta, & Pan, forthcomng), beat recognton (Acr, 2006), automatc face authentcaton (Bcego, Grosso, & Tstarell, 2005), evaluaton of consumer loans (L, Shue, & Huang, 2006), estmatng producton values (Chen & Wang, 2007; Pa & Ln, 2005), text categorzaton (Bratko & Flpc, 2006), medcal dagnoss (Glotsos, Tohka, & Ravazoula, 2005), mage classfcaton (Km, Yang, & Seo, 2005) and hand-wrtten dgt recognton (Burges & Scholkopf, 1997; Cortes & Vapnk, 1995), the applcatons n marketng are rather scarce (Cu & Curry, 2005). To our knowledge only a few mplementatons of SVMs n a customer churn envronment are publshed (Km, Shn, & Park, 2005; Zhao, L, & L, 2005). Ths study wll extend the use of SVMs n a customer-churn context n two ways: (1) Unlke former studes that mplemented SVMs on a very small sample, ths study apples SVMs n a more realstc churn settng. Indeed, once a churn model has been bult, t must be able to accurately valdate a new marketng dataset whch contans n practce ten thousands of records and often a lot of nose. Ths study contrbutes to the exstng lterature by usng a suffcent sample sze for tranng and valdatng the SVM models n a subscrber churn framework. These SVMs are benchmarked to logstc regresson and state-of-the-art random forests. Nesln et al. (2004) concluded that logstc modelng may even outperform the more sophstcated technques (lke neural networks), whle n a marketng settng random forests already proved to be superor to other more tradtonal classfcaton technques (Bucknx & Van den Poel, 2005; Larvère & Van den Poel, 2005). (2) Before SVMs can be mplemented, several parameters have to be optmzed n order to construct a frst-class classfer. Extractng the optmal parameters s crucal when mplementng SVMs (Hsu, Chang, & Ln, 2004; Km, Shn et al., 2005; Km, Yang et al., 2005). Consequently, a fnetuned parameter-selecton procedure has to be appled. Hsu et al. (2004) proposed a grd search and a cross-valdaton to extract the optmal parameters for SVMs. Ths procedure tres dfferent parameter pars on the tranng set usng a cross-valdaton procedure. Hsu et al. (2004) propose to select that par of parameters wth the best cross-valdaton accuracy.e., percentage of cases correctly classfed (PCC). The second contrbuton of ths study les n extendng ths prncple by selectng one addtonal parameter par. Not only the parameters wth the best cross-valdaton accuracy as proposed by Hsu et al. (2004) are selected, also the parameter par whch results n the hghest cross-valdaton area under the recever operatng curve (AUC) s used. In contrast to PCC, AUC takes nto account the ndvdual class performance by use of the senstvty and specfcty for several thresholds on the classfer s posteror churn probabltes (Egan, 1975; Swets, 1989; Swets & Pckett, 1982). In the end, t s possble to compare the predctve performance of these two parameter-selecton technques wth that of logstc regresson and random forests. As a substantve contrbuton, an overvew of the most mportant churn predctors s gven wthn ths subscrpton-servces settng. As such, marketng managers gan nsght nto whch predctors are mportant n dentfyng churn. Consequently, t may be possble to adapt ther marketng strateges based on ths newly obtaned nformaton. Followng an ntroducton of the modelng technques (.e., SVMs, random forests and logstc regresson), Secton 3 explans the evaluaton measures used n ths study. The model-selecton procedure for SVMs s presented n Secton 4. Secton 5 presents the research data, whle Secton 6 explans the expermental results. Conclusons and drectons for future research are gven n Secton 7. 2. Modelng technques 2.1. Support vector machnes The SVM approach s a novel classfcaton technque based on neural network technology usng statstcal learnng theory (Vapnk, 1995, 1998). In a bnary classfcaton context, SVMs try to fnd a lnear optmal hyperplane so that the margn of separaton between the postve and the negatve examples s maxmzed. Ths s equvalent to solvng a quadratc optmzaton problem n whch only the support vectors,.e., the data ponts closest to the optmal hyperplane, play a crucal role. However, n practce, the data s often not lnearly separable. In order to enhance the feasblty of lnear separaton, one may transform the nput space va a non-lnear mappng nto a hgher dmensonal feature space. Ths transformaton s done by usng a kernel functon. There are some advantages n usng SVMs (Km, Shn et al., 2005; Km, Yang et al., 2005): (1) there are only two free parameters to be chosen, namely the upper bound and the kernel parameter; (2) the soluton of SVM s unque, optmal and global snce the tranng of a SVM s done by solvng a lnearly constraned quadratc problem; (3) SVMs are based on the structural rsk mnmzaton (SRM) prncple, whch means that ths type of classfer mnmzes the upper bound on the actual rsk, compared to other classfers whch mnmze the emprcal rsk. Ths results n a very good generalzaton performance.

K. Coussement, D. Van den Poel / Expert Systems wth Applcatons 34 (2008) 313 327 315 We wll gve a general overvew of a SVM for a bnary classfcaton problem. For more detals about SVMs, we refer to the tutoral of Burges (1998). Gven a set of labeled tranng examples {x,y } wth = 1,2,3,...,N where y 2 { 1,1} and x 2 R n, and n the dmenson of the nput space. Suppose that the tranng data s lnearly separable, there exsts a weght vector w and a bas b such that the nequaltes w x þ b P 1; when y ¼ 1; ð1þ w x þ b 6 1; when y ¼ 1 ð2þ are vald for all elements of the tranng set. As such, we can rewrte these nequaltes n the form: y ðw x þ bþ P 1 wth ¼ 1; 2; 3;...; N: ð3þ Eq. (3) comes down to fnd two parallel boundares: B1: w x þ b ¼ 1; ð4þ B2: w x þ b ¼ 1 ð5þ at the opposte sdes of the optmal separatng hyperplane: H : w x þ b ¼ 0 wth margn wdth between the two boundares equal to 2/ kwk (Fg. 1). Thus one can fnd the par of boundares whch gves the maxmum margn by mnmzng 1 2 w2 ð7þ subject to y ðw x þ bþ P 1: ð8þ ð6þ maxmzng W ðaþ ¼ X a 1 X X a a j y 2 y j x x j j subject to a P 0 wth ¼ 1; 2; 3;...; N and X a y ¼ 0: The weght vector could be stated as follows: ð9þ ð10þ w ¼ X a y x : ð11þ The decson functon f(x) can be wrtten as " f ðxþ ¼sgnðw x þ bþ ¼sgn X # a y ðx x Þþb ; ð12þ where sgn s a sgn functon. In practce, the nput data wll often not be lnearly separable. However, one can stll mplement a lnear model by ntroducng a hgher dmensonal feature space to whch an nput vector s mapped va a non-lnear transformaton: H : X! X 0 ; x! Hðx Þ; ð13þ ð14þ where X s the nput space, H s the non-lnear transformaton and H(x ) represents the value of x mapped nto the hgher dmensonal feature space X 0. Therefore, Eq. (9) can be transformed to W ðaþ ¼ X a 1 X X a a j y 2 y j Hðx ÞHðx j Þ j ð15þ Ths constraned optmzaton problem can be solved usng the characterstcs of the Lagrange multplers (a) by subject to a P 0 wth ¼ 1; 2; 3;...; N and X a y ¼ 0: ð16þ By mappng the nput space nto a hgher dmensonal feature space, the problem of hgh dmensonalty and mplementaton complexty occurs. One can ntroduce the concept of nner product kernels. Consequently, there s no more need to know the exact value of H(x ), only the dot nner product s consdered whch facltates the mplementaton (Fg. 2). Kðx ; x j Þ¼Hðx ÞHðx j Þ ð17þ Fg. 1. Ths fgure shows the soluton for a bnary lnearly separable classfcaton problem. The boundares B1 and B2 separate the two classes. Data ponts on the boundares are called support vectors. Thus one tres to fnd the hyperplane H * where the margn s maxmal. Therefore, the decson functon becomes " # f ðxþ ¼sgn X a y HðxÞHðx Þþb " ¼ sgn X # a y Kðx x Þþb : ð18þ For resolvng ths decson functon, several types of kernel functons are avalable as gven n Table 1.

316 K. Coussement, D. Van den Poel / Expert Systems wth Applcatons 34 (2008) 313 327 2.2. Random forests Fg. 2. The non-lnear boundary n the nput space s mapped va a kernel functon nto hgher dmensonal feature space. The data becomes lnearly separable n the feature space. Table 1 Overvew of the dfferent kernel functons Kernel functon Mathematcal form a Lnear kernel K(x,x )=(xæx ) Polynomal kernel of degree d K(x,x )=(cxæx + r) d Radal bass functon K(x,x ) = exp{ ckx x k 2 } Sgmod kernel wth r 2 N K(x,x ) = tanh(cxæx + r) a d, r 2 N; c 2 R +. It s possble to extend these deas to handle non-separable data. In ths case, the margn wll become very small and t wll be mpossble to separate the data wthout any msclassfcaton. To solve ths problem, we relax the constrants (1) and (2) by ntroducng postve slack varables (e) (Cortes & Vapnk, 1995). Eqs. (1) and (2) become w x þ b P 1 e ; when y ¼ 1; ð19þ w x þ b 6 1 þ e ; when y ¼ 1 ð20þ wth e P 0. Eqs. (19) and (20) can be rewrtten as y ðw x þ bþ P 1 e wth ¼ 1; 2; 3;...; N: ð21þ The goal of the optmzaton process s to fnd the hyperplane that maxmzes the margn and mnmzes the probablty of msclassfcaton: mnmze 1 2 w2 þ C X e ð22þ subject to y ðw x þ bþ P 1 e ð23þ wth C, the cost, the penalty parameter for the error term. The larger C, the hgher the penalty to errors. Adaptng Eq. (15) to the non-separable case, one receves the followng optmzaton problem: maxmzng W ðaþ¼ X subject to a 1 X X a a j y 2 y j Kðx ;x j Þ 0 6 a 6 C wth ¼ 1;2;3;...;N and X j a y ¼ 0: ð24þ ð25þ More detals concernng the optmzaton process can be found n Chang and Ln (2004). In a bnary classfcaton context, decson trees (DT) became very popular because of ther easness and nterpretablty (Duda, Hart, & Stork, 2001). Moreover, DTs have the ablty to handle covarates measured at dfferent measurement levels. One major problem wth DTs s ther hgh nstablty (Haste, Tbshran, & Fredman, 2001). A small change n the data often results n very dfferent seres of splts, whch s often suboptmal when valdatng the traned model. In the past, ths problem was extensvely researched. It was Breman (2001) who ntroduced a soluton to the prevously mentoned problem. The new classfcaton technque s called: Random Forests. Ths technque uses a subset of m randomly chosen predctors to grow each tree on a bootstrap sample of the tranng data. Typcally, ths number of selected varables.e., m s much lower than the total number of varables n the model. After a large number of trees are generated, each tree votes for the most popular class. By aggregatng these votes over the dfferent trees, each case s predcted a class label. Random forests are already appled n several domans lke bonformatcs, quanttatve crmnology, geology, pattern recognton, medcne,...however, the applcatons n marketng are rare (Bucknx & Van den Poel, 2005; Larvère & Van den Poel, 2005). Random forests are used as benchmark n ths study, manly for fve reasons: (1) Luo et al. (2004) stated that the predctve performance s among the best of the avalable technques. (2) The outcomes of the classfer are very robust to outlers and nose (Breman, 2001). (3) Ths classfer outputs useful nternal estmates of error, strength, correlaton and varable mportance (Breman, 2001). (4) Reasonable computaton tme s observed by Bucknx and Van den Poel (2005). (5) Random forests are easy to mplement because there are only two free parameters to be set, namely m, the number of randomly chosen predctors, and the total number of trees to be grown. We follow Breman s (2001) suggestons: m s set equal to the square root of the total number of varables.e., 9 because 82 explanatory varables are ncluded n the model and a large number of trees.e., 1000 are chosen. 2.3. Logstc regresson Logstc regresson s a well-known classfcaton technque for predctng a dchotomous dependent varable. In runnng a logstc regresson analyss, the maxmum lkelhood functon s produced and maxmzed n order to acheve an approprate ft to the data (Allson, 1999). Ths technque s very popular for manly three reasons: (1) logt modelng s conceptually smple (Buckln & Gupta, 1992). (2) A closed-form soluton for the posteror probabltes s avalable (n contrary to SVMs). (3) It provdes quck and robust results n comparson to other classfcaton technques (Nesln et al., 2004).

K. Coussement, D. Van den Poel / Expert Systems wth Applcatons 34 (2008) 313 327 317 3. Evaluaton crtera After buldng a predctve model, marketers want to use these classfcaton models to predct future behavor. It s essental to evaluate the classfer n terms of performance. Frstly, the predctve model s estmated on a tranng set. Afterwards, ths model s valdated on an unseen dataset, the test set. It s essental to evaluate the performance on a test set, n order to ensure that the traned model s able to generalze well. For all three modelng technques, PCC, AUC and the top-decle lft are calculated. PCC, also known as accuracy, s undoubtedly the most commonly used evaluaton metrc of a classfer. Practcally, the posteror churn probabltes generated by the classfer are ranked from most lkely to churn to least lkely to churn. All cases above a certan threshold are classfed as churners; all cases havng a lower churn probablty are classfed as non-churners. In sum, PCC computes the rato of correctly classfed cases to the total number of cases to be classfed. It s mportant to notce that PCC s hghly dependent on the chosen threshold because only one threshold s consdered. Consequently, t does not gve an ndcaton how the performance wll vary when the cut-off s vared. Moreover, PCC does not consder the ndvdual class performance of a classfer. For example, wthn a skewed class dstrbuton, wrong predctons for the underrepresented class are very costly. Nevertheless, a model that predcts always the most common class thus neglectng the mnorty class stll provdes a relatvely good performance when evaluated on PCC. Unlke PCC, AUC takes nto account the ndvdual class performance for all possble thresholds. In other words, AUC wll compare the predcted class of an event wth the real class of that event, consderng all possble cut-off values for the predcted class. The recever operatng curve (ROC) s a graphcal plot of the senstvty.e., the number of true postves versus the total number of events and 1-specfcty.e., the number of true negatves versus the total number of non-events. The ROC can also be represented by plottng the fracton of true postves versus the fracton of false postves. The area under the recever operatng curve s used to evaluate the performance of a bnary classfcaton system (Hanley & McNel, 1982). In order to assess whether AUCs of the dfferent classfcaton technques are sgnfcantly dfferent from each other, the non-parametrc test of DeLong, DeLong, and Clarke-Pearson (1988) s used. In marketng applcatons, one s especally nterested n ncreasng the densty of the real events. The top 10% decle s an evaluaton measure that only focuses on the 10% cases most lkely to churn. Practcally, the cases are frst sorted from predcted most lkely to churn to predcted least lkely to churn. Afterwards, the proporton of real events n the top 10% most lkely to churn s compared wth the proporton of real events n the total dataset. Ths ncrease n densty s called the top-decle lft. For example, a top-decle lft of two means that the densty of churners n the top 10% s twce the densty of churners n the total dataset. The hgher the top-decle lft, the better the classfer. Potentally ths top-decle lft s very nterestng to target, because t contans a hgher number of real events. In other words, marketng analysts are nterested n just 10% of the customer base.e., those who are most lkely to churn because marketng budgets are lmted and actons to reduce churn would typcally nvolve only 10% of the entre lst of customers. 4. Model selecton for the support vector machnes Frst, we wll argue why the radal bass functon (RBF) kernel s used as the default kernel functon throughout ths study. Secondly, the grd-search method and cross-valdaton procedure for choosng the optmal penalty parameter C and kernel parameter c s explaned. In the thrd secton, two types of parameter-selecton technques are descrbed. 4.1. RBF kernel functon The RBF kernel functon s used as the default kernel functon wthn ths study, manly for four reasons (Hsu et al., 2004): (1) ths type of kernel makes t possble to map the non-lnear boundares of the nput space nto a hgher dmensonal feature space. So unlke the lnear kernel, the RBF kernel can handle a non-lnear relatonshp between the dependent and the explanatory varables. (2) In terms of performance Keerth and Ln (2003) concluded that the lnear kernel wth a parameter C has the same performance as the RBF kernel wth parameters (C,c). Ln and Ln (2003) showed that the sgmod kernel behaves lke the RBF kernel for certan parameters. (3) When lookng at the number of hyperparameters, the polynomal kernel has more hyperparameters than the RBF kernel. (4) The RBF kernel has less numercal dffcultes because the kernel values le between zero and one, whle the polynomal kernel values may go to nfnty or zero whle the degree s large. On the bass of these arguments, the RBF kernel s used as the default kernel functon. 4.2. Optmal parameter selecton usng grd search and cross-valdaton The RBF kernel needs two parameters to be set; C and c, wthc the penalty parameter for the error term and c as the kernel parameter. Both parameters play a crucal role n the performance of SVMs (Hsu et al., 2004; Km, Shn et al., 2005; Km, Yang et al., 2005). Improper selecton of these parameters can be counterproductve. Beforehand t s mpossble to know whch combnaton of (C,c) wll result n the hghest performance when valdatng the traned SVM to unseen data. Some knd of parameter-selecton procedure has to be done. Hsu et al. (2004) propose a grd search on C and c and a v-fold cross-valdaton on the tranng data. The goal of ths

318 K. Coussement, D. Van den Poel / Expert Systems wth Applcatons 34 (2008) 313 327 procedure s to dentfy the optmal C and c, so that the classfer can accurately predct unseen data. A common way to accomplsh ths s 2-fold cross-valdaton, where the tranng set s dvded nto two parts of whch one s unseen n tranng the classfer. Ths performance better reflects the capabltes of the classfer n valdatng unknown data. More generally, n a v-fold cross-valdaton, the tranng data s splt nto v subsets of equal sze. Iteratvely, one part s left out for valdaton, whle the other remanng (v 1) parts are used for tranng. Fnally, each case n the tranng set s predcted once. The crossvaldaton performance wll better reflect the true performance as when valdatng the classfer to unseen data, whle the valdaton set stays untouched. In order to dentfy whch parameter par performs best, one can repeat ths procedure wth several pars of (C,c). As such t s possble to calculate a cross-valdated evaluaton measure for every parameter par. In the end, t s possble to select these parameters based on the best cross-valdated performance. 4.3. Two parameter-selecton technques In ths study, a grd search on C and c s performed on the tranng set usng a 5-fold cross-valdaton. The grd search s realzed by evaluatng exponental sequences of C and c (.e., C =2 5, 2 3,...,2 13 ; c =2 3, 2 1,...,2 15 ). Bascally, all combnatons of (C, c) are tred and two pars of parameters are restraned: (1) the one wth the best crossvaldated accuracy as proposed by Hsu et al., 2004 and (2) the one wth the bggest cross-valdated area under the recever operatng curve. Ths addtonal parameter par s selected for the reason that unlke PCC, AUC consders the senstvty and specfcty as ndvdual class performance metrcs over all possble thresholds. Once these optmal parameter pars are obtaned, the whole tranng set s traned agan. Both classfers wll be used to valdate an unseen dataset. In the end, one can compare and benchmark the performance of both knds of SVMs. 5. Research data For the purpose of ths study, data from a Belgan newspaper publshng company s used. The subscrbers have to pay a fxed amount of money dependng on the length of subscrpton and the promotonal offer gven. The company does not allow endng the subscrpton pror to the maturty date. The churn-predcton problem n ths subscrpton context comes down to predctng whether the subscrpton wll/wll not be renewed wthn a perod of four weeks after the maturty date. Durng ths four-week perod, the company stll delvers the newspapers to the subscrbers. In ths way, the company gves the subscrbers the opportunty to renew ther subscrpton. Fg. 3 graphcally traces back the tme wndow of analyss. We use subscrpton data from January 2002 through September 2005. Usng ths tme frame, t s possble to derve the dependent Fg. 3. Graphcal dsplay of the tme wndow used to buld the churn model. varable and the explanatory varables. For constructng the dependent varable, the renewal ponts between July 2004 and July 2005 are consdered. Consequently, a customer s consdered as churner when hs/her subscrpton s not renewed wthn four weeks after the expry date. The explanatory varables contan nformaton coverng a 30- month perod returnng from every ndvdual renewal pont. These varables contan nformaton about clent/ company-nteractons, renewal-related nformaton, socodemographcs and subscrpton-descrbng nformaton (see Appendx A). Ths varety of nformaton s gathered at two levels: subscrpton level and subscrber level. At the subscrpton level, all nformaton from the current subscrpton s ncluded, whle at the subscrber level, all nformaton related to the subscrber s covered. For nstance, one can calculate the total number of complants on the current subscrpton only.e., the subscrpton level whle one can also consder the total number of complants of a subscrber coverng all hs/her subscrptons.e., subscrber level. Fnally, one ends up wth an ndvdual tmelne per subscrber for every renewal pont n the tme nterval. We decded to randomly select two samples of suffcent sze; the tranng set s used to estmate the model, whle the test set s used to valdate the model. The tranng set con- Table 2 Dstrbuton of the tranng set and test set Number of observatons Tranng set Subscrptons not renewed 22,500 50 Subscrptons renewed 22,500 50 Total 45,000 100 Relatve percentage Test set Subscrptons not 5014 11.14 renewed Subscrptons renewed 39,986 88.86 Total 45,000 100

K. Coussement, D. Van den Poel / Expert Systems wth Applcatons 34 (2008) 313 327 319 Table 3 The cross-valdated accuracy per (C, c) c C 2 5 2 3 2 1 2 1 2 3 2 5 2 7 2 9 2 13 2 3 56.360 64.351 65.756 66.248 65.469 65.181 64.924 64.519 64.342 2 1 68.147 70.525 71.733 71.418 70.453 69.458 68.836 68.314 68.087 2 1 75.353 76.582 77.127 76.262 74.627 72.958 71.947 70.859 70.394 2 3 75.959 77.144 77.649 77.622 77.558 76.440 74.918 73.320 71.842 2 5 74.789 76.164 76.960 77.471 78.039 77.996 77.924 78.056 76.396 2 7 74.367 74.948 75.975 76.440 77.118 77.758 77.719 78.084 78.089 2 9 75.163 74.349 74.827 75.907 76.167 76.693 76.959 77.722 77.726 2 11 74.240 75.209 74.344 74.840 75.856 76.107 76.271 76.517 77.144 2 13 54.767 74.213 75.198 74.403 74.836 75.860 76.093 76.202 76.398 2 15 50.000 64.406 74.103 75.198 74.406 74.829 75.872 76.089 76.182 Table 4 The cross-valdated performance (AUC) per (C, c) c C 2 5 2 3 2 1 2 1 2 3 2 5 2 7 2 9 2 13 2 3 75.710 76.201 76.083 75.279 74.283 73.669 73.273 72.918 72.610 2 1 80.059 80.221 80.092 78.600 77.058 75.878 75.007 74.441 74.085 2 1 82.703 83.552 83.722 82.728 80.951 79.069 77.616 76.386 75.442 2 3 83.865 84.296 84.507 84.472 83.857 82.406 80.500 78.402 76.388 2 5 83.373 83.926 84.172 84.496 84.691 84.592 84.212 83.239 81.745 2 7 82.871 83.188 83.670 83.896 84.172 84.514 84.702 84.699 84.477 2 9 82.232 82.810 83.087 83.506 83.625 83.861 84.173 84.504 84.674 2 11 80.998 82.229 82.790 83.059 83.448 83.462 83.593 83.869 84.190 2 13 72.936 80.996 82.228 82.785 83.052 83.431 83.393 83.436 83.601 2 15 50.000 72.987 80.995 82.228 82.784 83.051 83.427 83.377 83.375 tans as many churners as non-churners because many authors emphasze the need for a balanced tranng sample n order to relably dfferentate between defectors and nondefectors (Dekmpe & Degraeve, 1997; Rust & Metters, 1996; Yamaguch, 1992). So t s not uncommon to tran a model wth a non-natural dstrbuton (Chan & Stolfo, 1998; Wess & Provost, 2001). The test set contans a proporton of churners that s representatve for the true populaton n order to approxmate the predctve performance n a real-lfe stuaton. For both datasets, all varables are constructed n the same way. The explanatory varables are compled over a 30-month perod, whle the dependent varable contans nformaton whether the subscrpton wll/wll not be renewed. 6. Emprcal analyss 6.1. SVM models After conductng the grd search on the tranng data, the optmal (C,c) s(2 13,2 7 ) wth a cross-valdated accuracy of 78.089%. Table 3 summarzes the results of the grd search usng the cross-valdated accuracy as an evaluaton crteron. Furthermore, parameter par (2 7,2 7 ) results n the hghest cross-valdated AUC, 84.702. Table 4 consders the results of the grd-search procedure wth the cross-valdated AUC as a performance measure. These two parameters pars are used to tran a model on the complete tranng set. Two SVMs are obtaned, namely SVM acc 1 and SVM auc. 2 Fnally, both models can be valdated on a test set. On the one hand, one can compare the performance among both SVMs, whle on the other hand both SVMs can be benchmarked wth the performance of logstc regresson and random forests. 6.2. Comparng predctve performance among both knds of SVMs In ths secton, a comparson s made between the predctve performance of SVM acc and SVM auc. The evaluaton s performed n terms of AUC, PCC and top-decle lft. Both models are traned on a balanced tranng set, whle n the end these classfers have to be evaluated on a dataset whch represents the actual densty of churners (see Table 2). In order to assess the senstvty of the results to the actual proporton of churners n the dataset, we wll compare the performance of both SVMs on artfcal test sets wth dfferent class dstrbutons. More specfcally, 1 SVM acc = SVM generated usng parameters based on the model wth the best cross-valdated accuracy durng grd search. 2 SVM auc = SVM generated usng parameters based on the model wth the best cross-valdated AUC durng grd search.

320 K. Coussement, D. Van den Poel / Expert Systems wth Applcatons 34 (2008) 313 327 86 svmauc svmacc 85.6 85.2 AUC 84.8 84.4 84 50% 40% 30% 20% 18% 16% 14% 11.14% % churners Fg. 4. Area under the recever operatng curve for SVM acc and SVM auc appled to several test sets wth dfferent class dstrbutons. 90 89 svmauc svmacc PCC 88 87 86 85 84 83 82 81 80 79 78 50% 40% 30% 20% 18% 16% 14% 11.14% % churners Fg. 5. Percentage correctly classfed for SVM acc and SVM auc appled to several test sets wth dfferent class dstrbutons. 3.e., the dstrbuton that contans the proporton of churners that s representatve for the true populaton. we compare the natural dstrbuton 3 (11.14% churners) wth the artfcal ones (50%, 40%, 30%, 20%, 18%, 16%, 14%). These artfcal sets are created by randomly undersamplng the real test set.e., the one wth 11.14% churners. Fgs. 4 6 and Table 5 depct the performance of SVM acc and SVM auc for the dfferent class dstrbutons. As such a comparson can be made between both SVMs. As one may observe from Fg. 4, SVM auc performs better than SVM acc wthn all class dstrbutons n terms of AUC performance. In order to ensure that the dfferences n AUC are sgnfcant, the test proposed by DeLong et al. (1988) s appled. As such one can compare f the AUCs between SVM acc and SVM auc are sgnfcantly dfferent wthn a certan class dstrbuton. Table 5 reveals that on all test sets that contan 30% churners or less, SVM auc sgnfcantly outperforms SVM acc on a 90% confdence level (DeLong et al., 1988). When valdated on the natural dstrbuton, SVM auc sgnfcantly outperforms SVM acc at the 95% confdence level. Fg. 5 shows the performance of both SVMs n terms of PCC. Despte the fact that the dfferences n PCC are rather small, one may observe that SVM auc does not have an nferor performance compared to SVM acc when comng closer to the natural dstrbuton. Prevous fndngs are confrmed when evaluatng both SVMs usng the top-decle lft. There s a gap n top-decle lft between SVM acc and SVM auc. SVM auc has a hgher top-decle lft compared to SVM acc. Ths gap ncreases when devatng from the orgnal tranng dstrbuton.e., the one wth 50% churners. On the natural dstrbuton, SVM auc succeeds n retanng more churners wthn the top 10% customer most lkely to churn n comparson to SVM acc.

K. Coussement, D. Van den Poel / Expert Systems wth Applcatons 34 (2008) 313 327 321 5 svmauc svmacc 4.5 4 3.5 top-decle 3 2.5 2 1.5 1 50% 40% 30% 20% 18% 16% 14% 11.14% % churners Fg. 6. Top-decle lft for SVM acc and SVM auc appled to several test sets wth dfferent class dstrbutons. Table 5 Parwse comparson of performance (AUC) among several test sets usng dfferent class dstrbutons Number of churners (%) SVM acc SVM auc 50 1.16 (1); 0.281 40 1.57 (1); 0.210 30 3.65 (1); 0.056 a 20 3.78 (1); 0.052 a 18 4.35 (1); 0.037 a,b 16 4.27 (1); 0.039 a,b 14 5.96 (1); 0.014 a,b 11.14 6.04 (1); 0.014 a,b v 2 (df); p-value: a Sgnfcantly dfferent on 90% confdence level. b Sgnfcantly dfferent on 95% confdence level. Table 6 compares the predctve capabltes between SVM acc and SVM auc on the real test set (see Table 2). One can clearly see the gap n performance. SVM auc exhbts better predctve performance than SVM acc when both models are evaluated on the real test set. In terms of PCC, the ncrease s 0.55% ponts. There s also a sgnfcant mprovement n AUC of 0.24 (DeLong et al., 1988). Wth respect to the top-decle lft, an ncrease from 4.209 to 4.492 s acheved. In sum, when a SVM s traned wth a non-natural dstrbuton, t may be better to select ts parameters durng the grd search based on the cross-valdated AUC. The new parameter-selecton technque sgnfcantly mproves the AUC and the top-decle lft of the model, whle accuracy s certanly not decreased. Table 6 The performance of SVM acc and SVM auc : PCC and AUC on the real test set PCC AUC Top-decle lft SVM acc 88.08 84.90 a 4.209 SVM auc 88.63 85.14 a 4.492 a Sgnfcantly dfferent on 95% confdence level. In the followng part, we compare the performance of both knds of SVMs wth logstc regresson and random forests. 6.3. Comparng predctve performance of SVMs, logt and random forests The evaluaton measures on the real test set (see Table 2) for all models are represented n Tables 7 9. Table 7 compares the predctve performance of logt, random forests, SVM acc and SVM auc n terms of PCC and AUC. Table 8 shows the results from the test of DeLong et al. (1988) whch nvestgates f the AUCs of two models are sgnfcantly dfferent. One can fnd the top-decle lft for all models n Table 9. Addtonally, Tables 7 9 gve nformaton concernng the performance of SVM acc and SVM auc benchmarked to Table 7 The performance of the dfferent algorthms: PCC and AUC on the real test set Model PCC AUC Logt 88.47 84.60 Random forests 89.14 87.21 SVM acc 88.08 84.90 SVM auc 88.63 85.14 Table 8 Parwse comparson of performance (AUC) on the real test set Random forests SVM acc SVM auc Logt 219.52 (1) a 2.53 (1) b,c 12.56 (1) a Random forests 190.44 (1) a 166.58 (1) a SVM acc 6.04 (1) a Ch 2 (df): a Sgnfcantly dfferent on 95% confdence level. b Equal on a 95% confdence level. c Equal on a 90% confdence level.

322 K. Coussement, D. Van den Poel / Expert Systems wth Applcatons 34 (2008) 313 327 Table 9 The performance of the dfferent algorthms: top-decle lft on the real test Logt Random forests SVM acc SVM auc 4.478 4.754 4.209 4.492 logstc regresson. Only SVM auc dffers sgnfcantly n terms of predctve performance when compared to logstc regresson. In contrast to SVM auc, SVM acc classfes fewer cases correctly than logstc regresson. Moreover the test of DeLong et al. (1988) confrms that the AUC of SVM acc s not sgnfcantly dfferent from that of the logstc regresson. The need to select the rght parameter-selecton technque s confrmed when lookng at the top-decle lft crteron. SVM auc dentfes more churners than logstc regresson, whle the top-decle lft of SVM acc s lower than that of logt. From Tables 7 9, one can also compare the performance of both SVMs wth the performance of the random forests. It s clear that despte the parameter-selecton technque, SVMs are surpassed by random forests. In sum, t s shown that the parameter-selecton technque nfluences the predctve performance of SVMs. Consequently, when a SVM s traned on a balanced dstrbuton, t may be vable and preferable to consder other than the tradtonal parameter-selecton methods. Each mprovement n predctve performance wll result n a better return on nvestment of subscrber-retenton actons based on these predcton models. In ths study, SVMs are traned on a non-natural dstrbuton; t s shown that selectng the parameters based on the best cross-valdated AUC results n a better performance than when selectng them based on the hghest cross-valdated accuracy as was suggested n Hsu et al. (2004). In sum, one may say that choosng the rght parameter-selecton technque s vtal for optmzng a SVM applcaton. In the end, t would also be counterproductve to smply rely on tradtonal technques lke logstc regresson. SVMs n combnaton wth the correct parameter-selecton technque and random forests, both outperform logstc regresson. Nevertheless, n ths study random forests are better n predctng churn n the subscrpton servces than SVMs. 6.4. Varable mportance In ths secton, an overvew of the most mportant varables s gven. Ths s done based on the outcome of the random forest mportance measures for manly two reasons: () Random forests gve the best predctve performance compared to logstc regresson and SVM. () Unlke random forests, the SVM software does not produce an nternal rankng of varable mportance. Moreover, we do not report any measures for logstc regresson e.g., standardzed estmates because most measures are prone to multcollnearty. However, ths s not a problem when the focus les manly on predcton. In ths study, we wll elaborate the top-10 most mportant churn predctors. It s clear from Appendx B that the length of the subscrpton and recency.e., elapsed tme snce last renewal whch both belong to the category of varables descrbng a subscrpton 4 are ranked on top. Furthermore, another varable from the same category.e., the month of contract expraton s part of the top-10 most explanng churn varables. In contrast to extant research (e.g., Bauer, 1988), monetary value and frequency.e., the number of renewal ponts are not present wthn the top-10 lst of most mportant churn predctors n ths study. Although most mportant churn predctors are varables that belong to the group of varables descrbng a subscrpton, the mpact of some clent/company-nteracton varables cannot be neglected when nvestgatng the top-10 lst of most mportant varables: () Varables related to the ablty of voluntarly suspendng the subscrpton durng holday, durng a busness trp,... are present n the top-10. () Recency of complanng.e., the elapsed tme snce the last complant s also present n the top-10 most mportant churn predctors. Consequently, effcent-complant handlng strateges are mportant. Tax, Brown, and Chandrashekaran (1998) already stated that companes do not deal successfully wth servce falures because most companes underestmate the mpact of effcent complant handlng. () Moreover, ths study shows that the varable whch ndcates whether or not a subscrpton started from own ntatve belongs to the top-10 lst n contrast to smlar varables related to other purchase motvators lke drect malng campagns, tele-marketng actons, face-toface promotons,... In spte of the mportance of age, one can conclude that soco-demographcs do not play an mportant role n explanng churn n ths study whch confrms the fndng of Guadagn and Lttle (1983) and more recently, Ross, McCulloch, and Allenby (1996). 7. Conclusons and future research In ths study, we show that SVMs are able to predct churn n subscrpton servces. By mappng non-lnear nputs nto a hgh-dmensonal feature space, SVMs break down complex problems nto smpler dscrmnant functons. Because SVMs are based on the Structural Rsk Mnmzaton prncple that mnmzes the upper bound on the actual rsk, they show a very good performance when appled to a new, nosy marketng dataset. To valdate the performance of ths novel technque, we statstcally compare ts predctve performance wth those of logstc regresson and random forests. It s shown that a SVM whch s traned on a balanced dstrbuton outperforms a logstc regresson only when the approprate parameter-selecton technque s appled. However, when compar- 4 See Appendx A.

K. Coussement, D. Van den Poel / Expert Systems wth Applcatons 34 (2008) 313 327 323 ng the predctve capabltes of these SVMs wth state-ofthe-art random forests, our study ndcates that SVMs are surpassed by the random forests. Partcularly n ths study, we mplement a grd search usng a 5-fold cross-valdaton for obtanng the optmal upper bound C and kernel parameter c that are the most mportant when mplementng a SVM. Ths study offers an alternatve parameter-selecton technque that outperforms the prevously used technque by Hsu et al. (2004). The way n whch the optmal parameters are selected, can have sgnfcant nfluences on the performance of a SVM. Takng nto account alternatve parameter-selecton technques s crucal because even the smallest change n predctve performance can have sgnfcantly ncreases n the return on nvestment of the marketng-retenton actons based on these predcton models (Van den Poel & Larvère, 2004). In addton, one can say that academcs as well as practtoners don t have to smply rely on tradtonal technques lke logstc regresson. SVMs n combnaton wth the rght parameter-selecton technque and random forests offer some alternatves. Nevertheless, a trade-off has to be made between the tme allocated to the modelng procedure and the performance acheved. In ths study, most mportant churn predctors are part of the group of varables descrbng the subscrpton. Unlke ample research, monetary value and frequency are not present n the top-10 most mportant churn drvers. On the other hand, several clent/company-nteracton varables play an mportant role n predctng churn. In spte of the mportance of age, soco-demographcs do not play an mportant role n explanng churn n ths study. Drectons for future research are gven by the fact that nowadays there s no complete workng meta-theory to assst wth the selecton of the correct kernel functon and SVM parameters. Dervng a procedure to select the proper kernel functon and correct parameter values accordng to a specfc type of classfcaton problem s an nterestng topc for further research. Furthermore, applyng SVMs usng a suffcent sample sze can be very tme-consumng due to the long computatonal tme and often requres specfc software. Before SVMs can be wdely adopted, easy-to-use computer software should be avalable n the tradtonal data mnng packages. Acknowledgements We would lke to thank the anonymous Belgan publshng company for dsposng ther data. Next, we also lke to thank (1) Ghent Unversty for fundng the PhD project of Krstof Coussement (BOF 01D26705) and (2) the Flemsh government and Ghent Unversty (BOF equpment 011B5901) for fundng our computng resources durng ths project. Also specal thanks to L. Breman (z) for freely dstrbutng the random forest software, as well as C.-C. Chang and C.-J Ln for sharng ther SVM-toolbox, LIBSVM. Appendx A. Explanatory varables ncluded n the churnpredcton model Clent/company-nteracton varables: varables descrbng the clent/company relatonshp the number of complants; elapsed tme snce the last complant; the average cost of a complant (n terms of compensaton newspapers); the average postonng of the complants n the current subscrpton; the purchase motvator of the subscrpton; how the newspaper s delvered; the conversons made n dstrbuton channel, payment method & edton; elapsed tme snce last converson n dstrbuton channel, payment method & edton; the number of responses on drect marketng actons; the number of suspensons; the average suspenson length (n number of days); elapsed tme snce last suspenson; elapsed tme snce last response on a drect marketng acton; the number of free newspapers. Renewal-related varable: varables contanng renewalspecfc nformaton whether the prevous subscrpton was renewed before the expry date; how many days before the expry date, the prevous subscrpton was renewed; the average number of days the prevous subscrptons are renewed before expry date; the varance n the number of days the prevous subscrptons are renewed before expry date; elapsed tme snce last step n renewal procedure; the number of tmes the churner dd not renew a subscrpton. Soco-demographc varables: varables descrbng the subscrber age, whether the age s known, gender, physcal person (s the subscrber a company or a physcal person), whether contact nformaton (telephone, moble number, emal) s avalable. Subscrpton-descrbng varables: group of varables descrbng the subscrpton elapsed tme snce last renewal, monetary value,

324 K. Coussement, D. Van den Poel / Expert Systems wth Applcatons 34 (2008) 313 327 the number of renewal ponts, the length of the current subscrpton, the number of days a week the newspaper s delvered (ntensty ndcaton), what product the subscrber has, the month of contract expraton. Appendx B. Varable mportance measures No. AvgNormImp Varable name Level a Relatve varable b 1 73.946 The length of the current subscrpton Subscrpton 2 65.335 Elapsed tme snce last renewal Subscrpton 3 59.460 Elapsed tme snce last suspenson Subscrber 4 54.764 Elapsed tme snce last suspenson Subscrpton 5 54.035 The month of contract expraton Subscrpton 6 52.705 Age Subscrber 7 51.467 Elapsed tme snce last complant Subscrber 8 51.056 The average suspenson length (n number of days) Subscrber X 9 50.251 The purchase motvator of the subscrpton: own ntatve Subscrpton 10 48.560 The average suspenson length (n number of days) Subscrber 11 48.073 Elapsed tme snce last complant Subscrpton 12 47.330 Monetary value Subscrpton 13 46.882 Elapsed tme snce last step n renewal procedure Subscrpton 14 46.520 Physcal person: physcal person YES/NO Subscrber 15 44.811 The varance n the number of days the prevous subscrptons are Subscrber renewed before expry date 16 44.357 The average number of days the prevous subscrptons are renewed Subscrber before expry date 17 43.337 Elapsed tme snce last response on a drect marketng acton Subscrber 18 42.310 The average number of days the prevous subscrptons are renewed Subscrpton before expry date 19 40.011 The number of renewal ponts Subscrpton 20 38.448 The number of suspensons Subscrber X 21 37.295 The average suspenson length (n number of days) Subscrpton X 22 37.158 The purchase motvator of the subscrpton: drect marketng acton Subscrpton 23 36.536 The number of suspensons Subscrpton X 24 35.519 How many days before the expry date, the prevous subscrpton was Subscrpton renewed 25 35.279 Elapsed tme snce last converson n payment method Subscrber 26 33.802 Elapsed tme snce last converson n payment method Subscrpton 27 33.396 The number of complants Subscrber X 28 33.146 The average postonng of the complants n the current subscrpton Subscrpton 29 32.520 The conversons made n payment method Subscrpton 30 32.481 The average suspenson length (n number of days) Subscrpton 31 32.107 The conversons made n payment method Subscrpton X 32 31.637 The number of responses on drect marketng actons Subscrber X 33 31.144 The varance n the number of days the prevous subscrptons are Subscrpton renewed before expry date 34 29.640 The conversons made n payment method Subscrber 35 28.116 What product the subscrber has: edton X Subscrpton 36 28.027 The purchase motvator of the subscrpton: tele-marketng acton Subscrpton 37 27.860 What product the subscrber has: edton Y Subscrpton 38 27.584 The conversons made n payment method Subscrber X 39 26.390 Elapsed tme snce last converson n edton Subscrber 40 25.442 The number of responses on drect marketng actons Subscrber (contnued on next page)

K. Coussement, D. Van den Poel / Expert Systems wth Applcatons 34 (2008) 313 327 325 Appendx B (contnued) No. AvgNormImp Varable name Level a Relatve varable b 41 24.942 Elapsed tme snce last converson n dstrbuton channel Subscrpton 42 24.802 The number of suspensons Subscrber 43 24.237 The number of complants Subscrpton X 44 24.193 Whether the prevous subscrpton was renewed before the Subscrpton expry date 45 23.993 Elapsed tme snce last converson n edton Subscrpton 46 23.545 The purchase motvator of the subscrpton: promotonal Subscrpton offer 47 23.008 The number of suspensons Subscrpton 48 22.991 Elapsed tme snce last converson n dstrbuton channel Subscrber 49 22.486 The number of complants Subscrber 50 21.466 How the newspaper s delvered: prvate dstrbuton channel Subscrpton 51 20.917 Gender: female YES/NO Subscrber 52 20.087 The number of complants Subscrpton 53 19.624 Physcal person: company YES/NO Subscrber 54 19.600 How the newspaper s delvered: ndvdual newsboy Subscrpton 55 18.930 Whether the age s known Subscrber 56 18.906 The number of tmes the subscrber dd not renew a Subscrber subscrpton 57 18.426 The conversons made n dstrbuton channel Subscrber X 58 17.802 The conversons made n edton Subscrber X 59 17.718 The conversons made n dstrbuton channel Subscrber 60 17.289 The purchase motvator of the subscrpton: drect marketng Subscrpton acton 61 17.249 The purchase motvator of the subscrpton: face-to-face Subscrpton marketng 62 16.996 The conversons made n edton Subscrber 63 16.534 The conversons made n dstrbuton channel Subscrpton 64 16.095 The conversons made n edton Subscrpton X 65 15.446 The conversons made n dstrbuton channel Subscrpton X 66 15.406 What product the subscrber has: edton Z Subscrpton 67 15.222 The conversons made n edton Subscrpton 68 14.531 The average cost of a complant (n terms of compensaton Subscrber X newspapers) 69 13.995 The average cost of a complant (n terms of compensaton Subscrpton X newspapers) 70 13.602 Gender: male YES/NO Subscrber 71 12.587 The average cost of a complant (n terms of compensaton Subscrber newspapers) 72 12.005 How the newspaper s delvered: publc dstrbuton channel Subscrpton 73 11.830 Gender: prvate company YES/NO Subscrber 74 11.550 The purchase motvator of the subscrpton: drect marketng Subscrpton malng acton 75 11.059 How the newspaper s delvered: pck up newspaper at shop Subscrpton 76 10.651 The average cost of a complant (n terms of compensaton Subscrpton newspapers) 77 7.601 Gender: publc company YES/NO Subscrber 78 7.027 The number of free newspapers Subscrpton 79 5.190 The number of days a week the newspaper s delvered Subscrpton (ntensty ndcaton) 80 4.979 Whether contact nformaton (telephone, moble number, Subscrber emal) s avalable 81 2.991 How the newspaper s delvered: delvered abroad va courer Subscrpton 82 2.093 What product the subscrber has: edton W Subscrpton a See Secton 5: Research data. b Correcton of the varable by usng the length of subscrpton.

326 K. Coussement, D. Van den Poel / Expert Systems wth Applcatons 34 (2008) 313 327 References Acr, N. (2006). A support vector machne classfer algorthm based on a perturbaton method and ts applcaton to ECG beat recognton systems. Expert Systems wth Applcatons, 31(1), 150 158. Allson, P. D. (1999). Logstc regresson usng the SAS system: Theory and applcaton. Cary, NC: SAS Insttute Inc. Athanassopoulos, A. D. (2000). Customer satsfacton cues to support market segmentaton and explan swtchng behavor. Journal of Busness Research, 47(3), 191 207. Bauer, C. L. (1988). A drect mal customer purchase model. Journal of Drect Marketng, 2(3), 16 24. Bcego, M., Grosso, E., & Tstarell, M. (2005). Face authentcaton usng one-class support vector machnes. Lecture Notes n Computer Scence, 3781, 15 22. Bratko, A., & Flpc, B. (2006). Explotng structural nformaton for sem-structured document categorzaton. Informaton Processng and Management, 42(3), 679 694. Breman, L. (2001). Random forests. Machne Learnng, 45(1), 5 32. Bucknx, W., & Van den Poel, D. (2005). Customer base analyss: Partal defecton of behavourally loyal clents n a non-contractual FMCG retal settng. European Journal Of Operatonal Research, 164(1), 252 268. Buckln, R. E., & Gupta, S. (1992). Brand choce, purchase ncdence and segmentaton: An ntegrated modelng approach. Journal of Marketng Research, 29, 201 215. Burez, J., & Van den Poel, D. (forthcomng). CRM at Canal+ Belgque: Reducng customer attrton through targeted marketng. Expert Systems wth Applcatons. Burges, C. J. C., & Scholkopf, B. (1997). Improvng the accuracy and speed of support vector machnes. In M. Mozer, M. Jordan, & T. Petche (Eds.), Advances n neural nformaton processng systems. Cambrdge, MA: MIT Press. Burges, C. J. C. (1998). A tutoral on support vector machnes for pattern recognton. Data Mnng and Knowledge Dscovery, 2(2), 121 167. Chan P. K., & Stolfo S. J. (1998). Learnng wth non-unform class and cost dstrbutons: A case study n credt card fraud detecton. Proceedngs fourth nternatonal conference on knowledge dscovery and data mnng (pp. 164 168). Chang, C.-C., & Ln, C.-J. (2004). LIBSVM: A lbrary for support vector machnes. Techncal Report, Department of Computer Scence and Informaton Engneerng, Natonal Tawan Unversty. Chen, K.-Y., & Wang, C.-H. (2007). A hybrd SARIMA and support vector machnes n forecastng the producton values of the machnery ndustry n Tawan. Expert Systems wth Applcatons, 32(1), 254 264. Chen, X. J., Harrson, R., & Zhang, Y. Q. (2005). Mult-SVM fuzzy classfcaton and fuson method and applcatons n bonformatcs. Journal of Computatonal and Theoretcal Nanoscence, 2(4), 534 542. Cortes, C., & Vapnk, V. (1995). Support-vector networks. Machne Learnng, 20(3), 273 297. Cu, D., & Curry, D. (2005). Predctons n marketng usng the support vector machne. Marketng Scence, 24(4), 595 615. Dekmpe, M. G., & Degraeve, Z. (1997). The attrton of volunteers. European Journal of Operatonal Research, 98(1), 37 51. DeLong, E. R., DeLong, D. M., & Clarke-Pearson, D. L. (1988). Comparng the areas under two or more correlated recever operatng characterstc curves: A nonparametrc approach. Bometrcs, 44(3), 837 845. Duda, R. O., Hart, P. E., & Stork, D. G. (2001). Pattern classfcaton. New York: Wley. Egan, J. P. (1975). Sgnal detecton theory and ROC analyss. Seres n cognton and percepton. New York: Academc Press. Glotsos, D., Tohka, J., & Ravazoula, P. (2005). Automated dagnoss of bran tumours astrocytomas usng probablstc neural network clusterng and support vector machnes. Internatonal Journal of Neural Systems, 15(1 2), 1 11. Guadagn, P. M., & Lttle, J. D. C. (1983). A logt model of brand choce calbrated on scanner data. Marketng Scence, 2(3), 203 238. Hanley, J. A., & McNel, B. J. (1982). The meanng and use of the area under a recever operatng characterstc (ROC) curve. Radology, 143(1), 29 36. Haste, T., Tbshran, R., & Fredman, J. (2001). The elements of statstcal learnng: Data mnng, nference and predcton. New York: Sprnger-Verlag. He, J. Y., Hu, H. J., & Harrson, R. (2005). Understandng proten structure predcton usng SVM_DT. Lecture Notes n Computer Scence, 3759, 203 212. Hsu, C.-W., Chang, C.-C., & Ln, C.-J. (2004). A practcal gude to support vector classfcaton. Techncal Report, Department of Computer Scence and Informaton Engneerng, Natonal Tawan Unversty. Hung, S.-Y., Yen, D. C., & Wang, H.-Y. (2006). Applyng data mnng to telecom churn management. Expert Systems wth Applcatons, 31(3), 515 524. Jones, M. A., Mothersbaugh, D. L., & Beatty, S. E. (2000). Swtchng barrers and repurchase ntentons n servces. Journal of Retalng, 76(2), 259 374. Keaveney, S., & Parthasarathy, M. (2001). Customer swtchng behavor n onlne servces: An exploratory study of the role of selected atttudnal, behavoral and demographc factors. Journal of the Academy of Marketng Scence, 29(4), 374 390. Keerth, S. S., & Ln, C.-J. (2003). Asymptotc behavours of support vector machnes wth Gaussan kernel. Neural Computaton, 15(7), 1667 1689. Km, S., Shn, K. S., & Park, K. (2005). An applcaton of support vector machnes for customer churn analyss: Credt card case. Lecture Notes n Computer Scence, 3611, 636 647. Km, S. K., Yang, S., & Seo, K. S. (2005). Home photo categorzaton based on photographc regon templates. Lecture Notes n Computer Scence, 3689, 328 338. Larvère, B., & Van den Poel, D. (2005). Predctng customer retenton and proftablty by usng random forests and regresson forests technques. Expert Systems Wth Applcatons, 29(2), 472 484. L, S.-T., Shue, W., & Huang, M.-H. (2006). The evaluaton of consumer loans usng support vector machnes. Expert Systems wth Applcatons, 30(4), 772 782. Ln, H.-T., & Ln, C.-J. (2003). A study on sgmod kernels for SVM and the tranng of non-psd kernels by SMO-type methods. Techncal report, Department of Computer Scence and Informaton Engneerng, Natonal Tawan Unversty. Luo, T., Kramer, K., Goldgof, D. B., Hall, L. O., Samson, S., Remsen, A., et al. (2004). Recognzng plankton mages from the shadow mage partcle proflng evaluaton recorder. IEEE Transactons on Systems Man and Cybernetcs Part B Cybernetcs, 34(4), 1753 1762. Nesln, S. A., Gupta, S., Kamakura, W., Lu, J., & Mason, C. (2004). Defecton detecton: Improvng predctve accuracy of customer churn models. Workng Paper. Pa, P. F., & Ln, C. S. (2005). Usng support vector machnes to forecast the producton values of the machnery ndustry n Tawan. Internatonal Journal of Advanced Manufacturng Technology, 27(1 2), 205 210. Renartz, W., & Kumar, V. (2003). The mpact of customer relatonshp characterstcs on proftable lfetme duraton. Journal of Marketng, 67(1), 77 99. Ross, P. E., McCulloch, R. E., & Allenby, G. M. (1996). Value of household nformaton n target marketng. Marketng Scence, 15, 321 340. Rust, R. T., & Metters, R. (1996). Mathematcal models of servce. European Journal of Operatonal Research, 91(3), 427 439. Swets, J. A. (1989). ROC analyss appled to the evaluaton of medcal magng technques. Investgatve Radology, 14, 109 121.

K. Coussement, D. Van den Poel / Expert Systems wth Applcatons 34 (2008) 313 327 327 Swets, J. A., & Pckett, R. M. (1982). Evaluaton of dagnostc systems: Methods from sgnal detecton theory. New York: Academc Press. Tax, S. S., Brown, S. W., & Chandrashekaran, M. (1998). Customer evaluatons of servce complant experences: Implcatons for relatonshp marketng. Journal of Marketng, 62(Aprl), 60 76. Thomas, J. S. (2001). A methodology for lnkng customer acquston to customer retenton. Journal of Marketng Research, 38(2), 262 268. Van den Poel, D., & Larvère, B. (2004). Customer attrton analyss for fnancal servces usng proportonal hazard models. European Journal of Operatonal Research, 157, 196 217. Vapnk, V. (1998). Statstcal learnng theory. New York: Wley. Vapnk, V. (1995). The nature of statstcal learnng theory. New York: Sprnger. Wess, G., & Provost, F. (2001). The effect of class dstrbuton on classfer learnng. Techncal Report ML-TR-43, Department of Computer Scence, Rutgers Unversty. Yamaguch, K. (1992). Accelerated falure-tme regresson models wth a regresson model of survvng fracton: An applcaton to the analyss of permanent employment n Japan. Journal of the Amercan Statstcal Assocaton, 87(418), 284 292. Zhao, Y., L, B., & L, X. (2005). Customer churn predcton usng mproved one-class support vector machne. Lecture Notes n Artfcal Intellgence, 3584, 300 306. Zhong, W., He, J., Harrson, R., Ta, P. C., & Pan, Y. (forthcomng). Clusterng support vector machnes for proten local structure predcton. Expert Systems wth Applcatons.