Knowledge Discovery in a Direct Marketing Case using Least Squares Support Vector Machines



Similar documents
Forecasting the Direction and Strength of Stock Market Movement

What is Candidate Sampling

The Development of Web Log Mining Based on Improve-K-Means Clustering Analysis

Logistic Regression. Lecture 4: More classifiers and classes. Logistic regression. Adaboost. Optimization. Multiple class classification

CS 2750 Machine Learning. Lecture 3. Density estimation. CS 2750 Machine Learning. Announcements

Institute of Informatics, Faculty of Business and Management, Brno University of Technology,Czech Republic

Bayesian Network Based Causal Relationship Identification and Funding Success Prediction in P2P Lending

Can Auto Liability Insurance Purchases Signal Risk Attitude?

Feature selection for intrusion detection. Slobodan Petrović NISlab, Gjøvik University College

Forecasting the Demand of Emergency Supplies: Based on the CBR Theory and BP Neural Network

DEFINING %COMPLETE IN MICROSOFT PROJECT

benefit is 2, paid if the policyholder dies within the year, and probability of death within the year is ).

An Alternative Way to Measure Private Equity Performance

Statistical Methods to Develop Rating Models

A hybrid global optimization algorithm based on parallel chaos optimization and outlook algorithm

A study on the ability of Support Vector Regression and Neural Networks to Forecast Basic Time Series Patterns

IMPACT ANALYSIS OF A CELLULAR PHONE

Performance Analysis and Coding Strategy of ECOC SVMs

Credit Limit Optimization (CLO) for Credit Cards

8.5 UNITARY AND HERMITIAN MATRICES. The conjugate transpose of a complex matrix A, denoted by A*, is given by

2008/8. An integrated model for warehouse and inventory planning. Géraldine Strack and Yves Pochet

How To Understand The Results Of The German Meris Cloud And Water Vapour Product

Single and multiple stage classifiers implementing logistic discrimination

Multiple-Period Attribution: Residuals and Compounding

The OC Curve of Attribute Acceptance Plans

A DYNAMIC CRASHING METHOD FOR PROJECT MANAGEMENT USING SIMULATION-BASED OPTIMIZATION. Michael E. Kuhl Radhamés A. Tolentino-Peña

ANALYZING THE RELATIONSHIPS BETWEEN QUALITY, TIME, AND COST IN PROJECT MANAGEMENT DECISION MAKING

Answer: A). There is a flatter IS curve in the high MPC economy. Original LM LM after increase in M. IS curve for low MPC economy

Gender Classification for Real-Time Audience Analysis System

Project Networks With Mixed-Time Constraints

How To Calculate The Accountng Perod Of Nequalty

Support Vector Machines

Financial market forecasting using a two-step kernel learning method for the support vector regression

Module 2 LOSSLESS IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

Course outline. Financial Time Series Analysis. Overview. Data analysis. Predictive signal. Trading strategy

Traffic-light a stress test for life insurance provisions

An Interest-Oriented Network Evolution Mechanism for Online Communities

Calculation of Sampling Weights

The Application of Fractional Brownian Motion in Option Pricing

Robust Design of Public Storage Warehouses. Yeming (Yale) Gong EMLYON Business School

THE DISTRIBUTION OF LOAN PORTFOLIO VALUE * Oldrich Alfons Vasicek

An Evaluation of the Extended Logistic, Simple Logistic, and Gompertz Models for Forecasting Short Lifecycle Products and Services

Power-of-Two Policies for Single- Warehouse Multi-Retailer Inventory Systems with Order Frequency Discounts

Exhaustive Regression. An Exploration of Regression-Based Data Mining Techniques Using Super Computation

On the Optimal Control of a Cascade of Hydro-Electric Power Stations

Data Broadcast on a Multi-System Heterogeneous Overlayed Wireless Network *

Logistic Regression. Steve Kroon

L10: Linear discriminants analysis

J. Parallel Distrib. Comput.

BERNSTEIN POLYNOMIALS

Scale Dependence of Overconfidence in Stock Market Volatility Forecasts

The Current Employment Statistics (CES) survey,

How To Know The Components Of Mean Squared Error Of Herarchcal Estmator S

THE APPLICATION OF DATA MINING TECHNIQUES AND MULTIPLE CLASSIFIERS TO MARKETING DECISION

Searching for Interacting Features for Spam Filtering

Mining Feature Importance: Applying Evolutionary Algorithms within a Web-based Educational System

Luby s Alg. for Maximal Independent Sets using Pairwise Independence

Calculating the high frequency transmission line parameters of power cables

New Approaches to Support Vector Ordinal Regression

Churn prediction in subscription services: An application of support vector machines while comparing two parameter-selection techniques

Causal, Explanatory Forecasting. Analysis. Regression Analysis. Simple Linear Regression. Which is Independent? Forecasting

Risk-based Fatigue Estimate of Deep Water Risers -- Course Project for EM388F: Fracture Mechanics, Spring 2008

行 政 院 國 家 科 學 委 員 會 補 助 專 題 研 究 計 畫 成 果 報 告 期 中 進 度 報 告

A PROBABILITY-MAPPING ALGORITHM FOR CALIBRATING THE POSTERIOR PROBABILITIES: A DIRECT MARKETING APPLICATION

AN APPOINTMENT ORDER OUTPATIENT SCHEDULING SYSTEM THAT IMPROVES OUTPATIENT EXPERIENCE

Prediction of Stock Market Index Movement by Ten Data Mining Techniques

1.1 The University may award Higher Doctorate degrees as specified from time-to-time in UPR AS11 1.

USING GOAL PROGRAMMING TO INCREASE THE EFFICIENCY OF MARKETING CAMPAIGNS

Enabling P2P One-view Multi-party Video Conferencing

How Sets of Coherent Probabilities May Serve as Models for Degrees of Incoherence

Recurrence. 1 Definitions and main statements

Staff Paper. Farm Savings Accounts: Examining Income Variability, Eligibility, and Benefits. Brent Gloy, Eddy LaDue, and Charles Cuykendall

PRIVATE SCHOOL CHOICE: THE EFFECTS OF RELIGIOUS AFFILIATION AND PARTICIPATION

A Novel Methodology of Working Capital Management for Large. Public Constructions by Using Fuzzy S-curve Regression

Methodology to Determine Relationships between Performance Factors in Hadoop Cloud Computing Applications

HOUSEHOLDS DEBT BURDEN: AN ANALYSIS BASED ON MICROECONOMIC DATA*

An MILP model for planning of batch plants operating in a campaign-mode

Face Verification Problem. Face Recognition Problem. Application: Access Control. Biometric Authentication. Face Verification (1:1 matching)

Least 1-Norm SVMs: a New SVM Variant between Standard and LS-SVMs

High Correlation between Net Promoter Score and the Development of Consumers' Willingness to Pay (Empirical Evidence from European Mobile Markets)

Gaining Insights to the Tea Industry of Sri Lanka using Data Mining

Time Delayed Independent Component Analysis for Data Quality Monitoring

When Talk is Free : The Effect of Tariff Structure on Usage under Two- and Three-Part Tariffs

Vision Mouse. Saurabh Sarkar a* University of Cincinnati, Cincinnati, USA ABSTRACT 1. INTRODUCTION

PEER REVIEWER RECOMMENDATION IN ONLINE SOCIAL LEARNING CONTEXT: INTEGRATING INFORMATION OF LEARNERS AND SUBMISSIONS

Detecting Credit Card Fraud using Periodic Features

Loop Parallelization

The Use of Analytics for Claim Fraud Detection Roosevelt C. Mosley, Jr., FCAS, MAAA Nick Kucera Pinnacle Actuarial Resources Inc.

A Genetic Programming Based Stock Price Predictor together with Mean-Variance Based Sell/Buy Actions

A Hierarchical Anomaly Network Intrusion Detection System using Neural Network Classification

Feasibility of Using Discriminate Pricing Schemes for Energy Trading in Smart Grid

Transcription:

Knowledge Dscovery n a Drect Marketng Case usng Least Squares Support Vector Machnes S. Vaene, 1, * B. Baesens, 1 T. Van Gestel, 2 J. A. K. Suykens, 2 D. Van den Poel, 3 J. Vanthenen, 1 B. De Moor, 2 G. Dedene 1 1 K.U. Leuven, Department of Appled Economc Scences, Naamsestraat 69, B-3000 Leuven, Belgum 2 K.U. Leuven, Department of Electrcal Engneerng ESAT-SISTA, Kasteelpark Arenberg 10, B-3001 Leuven, Belgum 3 Ghent Unversty, Department of Marketng, Hovenersberg 24, B-9000 Ghent, Belgum We study the problem of repeat-purchase modelng n a drect marketng settng usng Belgan data. More specfcally, we nvestgate the detecton and qualfcaton of the most relevant explanatory varables for predctng purchase ncdence. The analyss s based on a wrapped form of nput selecton usng a senstvty based prunng heurstc to gude a greedy, stepwse, and backward traversal of the nput space. For ths purpose, we make use of a powerful and promsng least squares support vector machne Ž LS-SVM. classfer formulaton. Ths study extends beyond the standard recency frequency monetary Ž RFM. modelng semantcs n two ways: Ž 1. by ncludng alternatve operatonalzatons of the RFM varables, and Ž. 2 by addng several other Ž non-rfm. predctors. Results ndcate that elmnaton of redundant rrelevant nputs allows sgnfcant reducton of model complexty. The emprcal fndngs also hghlght the mportance of frequency and monetary varables, whle the recency varable category seems to be of somewhat lesser mportance to the case at hand. Results also pont to the added value of ncludng non-rfm varables for mprovng customer proflng. More specfcally, customer company nteracton, measured usng ndcators of nformaton requests and complants, and merchandse returns provde addtonal predctve power to purchase ncdence modelng for database marketng. 2001 John Wley & Sons, Inc. * Author to whom correspondence should be addressed; e-mal: Stjn.Vaene@ econ.kuleuven.ac.be. Contract grant sponsor: Flemsh government ŽResearch Councl K.U. Leuven: GOA-Mefsto; FWO-Flanders: ICCoS and AN-MMM; IWT: STWW Eureka.. Contract grant sponsor: Belgum government. Contract grant number: IUAP-IV 02, IV-24. Contract grant sponsor: European Commsson. Contract grant number: TMR, ERNSI. Ž. INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS, VOL. 16, 1023 1036 2001 2001 John Wley & Sons, Inc.

1024 VIAENE ET AL. I. INTRODUCTION The man objectve of ths paper nvolves the detecton and qualfcaton of the most relevant varables for repeat-purchase modelng n a drect marketng settng. Ths knowledge s beleved to vastly enrch customer proflng and thus contrbute drectly to more targeted customer contact. The emprcal study focuses on the purchase ncdence,.e., the ssue whether or not a purchase s made from any product category offered by the drect malng company. Standard recency frequency monetary modelng seman- 1 tcs underle the dscussed purchase ncdence model. Ths bnary Žbuyer versus nonbuyer. classfcaton problem s beng tackled n ths paper by usng least squares support vector machne Ž LS-SVM. classfers. LS-SVMs have recently been ntroduced n the lterature 2 and excellent benchmark results have been reported. 3 Havng constructed an LS-SVM classfer wth all avalable predctors, we engage n an nput selecton experment. Input selecton has been an actve area of research n the datamnng feld for many years now. A compact, yet hghly accurate model may come n very handy n Ž on-lne. customer proflng systems. Furthermore, by reducng the number of nput features, both human understandng and computatonal performance can often be vastly enhanced. Secton II elaborates on some response modelng ssues ncludng a lterature revew and descrpton of the data set. In Secton III, we dscuss the basc underpnnngs of LS-SVMs for bnary classfcaton. The nput selecton experment and correspondng results are presented and dscussed n Secton IV. II. THE RESPONSE MODELING CASE FOR DIRECT MARKETING A. Response Modelng n Drect Marketng Cullnan s generally credted for dentfyng the three sets of varables most often used n response modelng: recency, frequency, and monetary Ž RFM.. 1,4,5 Snce then, the lterature has provded so many uses of these three varable categores, that there s overwhelmng evdence both from academcally revewed studes as well as from practtoners experence that the RFM varables are an mportant set of predctors for modelng mal-order repeat purchasng. However, the benefcal effect of ncludng other varables nto the response model has also been nvestgated. For mal-order response modelng, several alternatve problem formulatons have been proposed based on the choce of the dependent varable. The frst category s purchase ncdence modelng. 6 In ths problem formulaton, the man queston s whether a customer wll purchase durng the next malng perod,.e., one tres to predct the purchase ncdence wthn a fxed tme nterval Žtypcally half a year.. Other authors have nvestgated related problems dealng wth both the purchase ncdence and the amount of purchase n a jont model. 7,8 A thrd alternatve perspectve for response modelng s to model nterpurchase tme through survval analyss or Ž splt-. hazard rate models whch model whether a purchase takes place together wth the duraton of tme untl a purchase

KNOWLEDGE DISCOVERY IN DIRECT MARKETING 1025 occurs. 9,10 Table I provdes a summary of contrbutons wth regard to the three alternatve problem formulatons. We observe that the Ž frst. purchase ncdence formulaton s clearly the most popular n the exstng lterature. 11 Moreover, most studes nclude many predctors, even though only a mnorty ncludes all categores. Ths paper focuses on the frst type of problem,.e., purchase ncdence modelng. Ths choce s motvated by the fact that the majorty of prevous research n the drect marketng lterature focuses on the purchase ncdence problem. 12,13 Furthermore, ths s exactly the settng that mal-order companes are typcally confronted wth. They have to decde whether or not a specfc offerng wll be sent to a Ž potental. customer durng a certan malng perod. Gven a tendency of rsng malng costs and ncreasng competton, we can easly see an ncreasng mportance for response modelng. 14 Improvng the targetng of the offers may ndeed counter these two challenges by lowerng nonresponse. Moreover, from the perspectve of the recpent of the Ždrect mal. messages, mal-order companes do not want to overload consumers wth catalogs. The mportance of response modelng to the mal-order ndustry s further llustrated by the fact that the ssue of mprovng targetng was among the top three concerns wth 73.5% of the catalogers n the sample mentoned n Ref. 15. B. The Data Set From a major Belgan mal-order company, we obtaned data on past purchase behavor at the order-lne level,.e., we know when a customer purchased what quantty of a partcular product at whch prce as part of what order. Ths allowed us, n close cooperaton wth doman experts and guded by the extensve lterature, to derve all the necessary purchase behavor varables for a total sample sze of 5,000 customers, of whch 37.94% represent buyers. For each customer, these varables were measured for the perod between 1 July 1993 and 30 June 1997. The goal s to predct whether an exstng customer wll repurchase n the observaton perod between 1 July 1997 and 31 December 1997 usng the nformaton provded by the purchase behavor varables. Ths problem bols down to a bnary classfcaton problem: Wll a customer repurchase or not? Notce that the focus s on customer retenton and not on customer acquston. The recency, frequency, and monetary varables have then been modeled as descrbed n detal n Ref. 11. We used two tme horzons for all RFM varables. The Hst horzon refers to the fact that the varable s measured for the perod between 1 July 1993 and 30 June 1997. The Year horzon refers to the fact that the varable s measured over the last year. Includng both tme horzons allows us to check whether more recent data are more relevant than hstorcal data. All RFM varables are modeled both wth and wthout the occurrence of returned merchandse, ndcated by R and N n the varable name, respectvely. The former s operatonalzed by ncludng the counts of returned merchandse n the varable values, whereas n the latter case these counts are omtted. Takng

1026 VIAENE ET AL. Table I. Lterature revew of response modelng papers. Independent Varable Dependent Varable Context of Applcaton Other Soco- Catalog or Behavoral Demographc Bnary & Bnary & Fund- Catalog Specalty Reference R F M Varables Varables Bnary Amount Tmng Rasng Ž general. Malng Berger and Maglozz Ž 1992. 16 X X X X X X Btran and Mondschen Ž 1996. 17 X X X X X Bult and Wttnk Ž 1996. 18 X X X X X Bult Ž 1993. 19 X X X X X Bult et al. Ž 1997. 20 X X X X X X X Gonul and Sh Ž 1998. 21 X X X X Kaslow Ž 1997. 22 X X X X X X X Levn and Zahav Ž 1998. 7 X X X X X X X X Maglozz and Berger Ž 1993. 23 X X X X X Maglozz Ž 1989. 24 X X X X X Thrasher Ž 1991. 25 X X X X X X Van den Poel & Leuns Ž 1998. 10 X X X X X X Ž. 11 Van den Poel 1999 X X X X X X X Van der Scheer Ž 1998. 8 X X X X Zahav and Levn Ž 1997. 13 X X X X X X

KNOWLEDGE DISCOVERY IN DIRECT MARKETING 1027 nto account both tme horzons Ž Year versus Hst. and ncluson versus excluson of returned tems Ž R versus N., we arrve at a 2 2 desgn n whch each RFM varable s operatonalzed n four ways. For the recency varable, many operatonalzatons have already been suggested. In ths paper, we defne the recency varable as the number of days snce the last purchase wthn a specfc tme wndow Ž Hst versus Year. and nor excludng returned merchandse Ž R versus N.. 4 Recency has been found to be nversely related to the probablty of the next purchase,.e., the longer the tme delay snce the last purchase the lower the probablty of a next purchase wthn the specfc perod. 1 In the context of drect mal, t has generally been observed that multbuyers Ž buyers who already purchased several tmes. are more lkely to repurchase than buyers who only purchased once. 4,26 Although no detaled results are reported because of the propretary nature of most studes, the frequency varable s generally consdered to be the most mportant of the RFM varables 12. Bauer suggests to operatonalze the frequency varable as the number of purchases dvded by the tme on the customer lst snce the frst purchase. 4 We choose to operatonalze the frequency varable as the number of purchases made n a certan tme perod Ž Hst versus Year. whle n- or excludng returned merchandse Ž R versus N.. In the drect marketng lterature, the general conventon s that the more money a person has spent wth a company, the hgher hs her lkelhood of purchasng the next offerng. 27 Nash suggests to operatonalze monetary value as the hghest transacton sale or as the average order sze. 12 Levn and Zahav propose to use the average amount of money per purchase. 27 We model the monetary varable as the total accumulated monetary amount of spendng by a customer durng a certan tme perod Ž Hst versus Year. whle n- or excludng returned merchandse Ž R versus N.. Addtonally, we nclude the natural logarthmc transformaton Ž ln. of all monetary varables as a means to reduce the skewness of the dstrbutons. Apart from the RFM varables, we also ncluded nne other customer proflng nputs. 11 The type and frequency of contact whch customers have wth the mal-order company may yeld mportant nformaton about ther future purchasng behavor. The GenInfo and GenCust are bnary customer company nteracton varables ndcatng whether the customer asked for general nformaton Ž respectvely, fled general complants.. Snce customer Ž ds. satsfacton may not only be revealed by general complants but also by returnng tems, we ncluded two extra varables. The RetMerch varable s a bnary varable ndcatng whether the customer has ever returned an tem that was prevously ordered from the mal-order company. The RetPerc varable measures the total monetary amount of returned orders dvded by the total amount of spendng. The Ndays varable models the length of the customer relatonshp n days. It s commonly beleved that consumers households wth a longer relatonshp wth the company have a hgher probablty of repurchase than households wth shorter relatonshps. IncrHst and IncrYear are operatonalzatons of a behavoral loyalty measure. We propose to perform a medan splt of the length of the

1028 VIAENE ET AL. relatonshp Ž tme snce the household became a customer.. Ths enables us to compare the number of purchases Ž.e., frequency. between the frst and last half of the tme wndow. The followng formula s used: purchases second half purchases frst half purchases frst half Ž 1. When the above measure s postve, ths may gve us an ndcaton of ncreasng loyalty by that customer to the Ž mal-order. company, and pso facto satsfacton wth the current level of servce. Remember that the suffx Hst reflects that the whole purchase hstory s used, whereas n the case of the suffx Year, only transactons from the last year are ncluded. The ProdclaT respectvely ProdclaM varables represent the total Ž T. respectvely, mean Ž M. forward-lookng weghted product ndex. The weghtng procedure represents the forward-lookng nature of a product category purchase, derved from another sample of data. Table II gves an overvew of the varables dscussed above. Notce that all mssng values were handled by the mean mputaton procedure 28 and that all predctor varables were normalzed to zero mean and unt varance pror to ther ncluson n the model. 29 III. LEAST SQUARES SVM CLASSIFICATION A. LS-SVMs for Bnary Classfcaton Gven a tranng set x 4, y 1 N wth nput data x n and correspondng bnary class labels y 1, 1 4, the SVM classfer, accordng to Vapnk s orgnal formulaton, 30 33 satsfes the followng condtons: w T Ž x. b 1 f y 1 Ž 2. w T Ž x. b 1 f y 1 Table II. A lstng of all nputs Ž both RFM and non-rfm. ncluded n the drect marketng case. Recency Frequency Monetary Other RecYearR FrYearR MonHstR ProdclaT RecYearN FrYearN MonHstN ProdclaM RecHstR FrHstR MonYearR GenCust RecHstN FrHstN MonYearN GenInfo lnž MonHstR. Ndays lnž MonHstN. IncrHst lnž MonYearR. lnž MonYearN. IncrYear RetMerch RetPerc

whch s equvalent to: KNOWLEDGE DISCOVERY IN DIRECT MARKETING 1029 Ž. Ž. T y w x b 1 1,...,N 3 Ž. n n The nonlnear functon : h maps the nput space to a hgh dmensonal Ž and possbly nfnte dmensonal. feature space. In prmal weght space the classfer then takes the form: T yž x. sgn w Ž x. b Ž 4. however, t s never evaluated n ths form. One defnes the optmzaton problem as: subject to: 1 T N Ý 1 mn T Ž w,. w w c Ž 5. 2 w, b, Ž. T y w x b 1 1,...,N 0 1,...,N The varables are slack varables whch are needed to allow msclassfcatons n the set of nequaltes Ž e.g., due to overlappng dstrbutons.. The postve real constant c should be consdered as a tunng parameter n the algorthm. For nonlnear SVMs, the QP-problem and the classfer are never solved and evaluated n ths form. Instead, a dual space formulaton and representaton are obtaned by applyng the Mercer condton Ž see Refs. 30 33 for detals.. Vapnk s SVM classfer formulaton was modfed by Suykens and Vandewalle 2 nto the followng LS-SVM formulaton: N T 2 Ý 1 Ž 6. 1 1 mn TŽ w, e. w w e Ž 7. 2 2 w, b, e subject to the equalty constrants: Ž. Ž. T y w x b 1 e, 1,...,N 8 Ths formulaton now conssts of equalty nstead of nequalty constrants and takes nto account a squared error wth a regularzaton term smlar to rdge regresson. The soluton s obtaned after constructng the Lagrangan: N T Ý 1 4 L Ž w, b, e;. TŽ w, e. y w Ž x. b 1 e Ž 9. where are the Lagrange multplers. After takng the condtons for optmal ty, one obtans the followng lnear system 2 : 0 Y T b 0 Ž 10. 1 Y I 1

1030 VIAENE ET AL. where Z Ž x. T y ;...; Ž x. T y, Y y ;...; y, 1 1;...;1, 1 1 N N 1 N T 2 ;...;, ZZ, and Mercer s condton s appled wthn the matrx: 1 N T j j Ž. Ž j. yy x x yykž x, x. j j For the kernel functon KŽ,. one typcally has the followng choces: KŽ x, x. x T x, Ž lnear kernel. Ž. 2 2 2 4 T d K x, x x x 1, polynomal kernel of degree d Ž. Ž. KŽ x, x. exp x x, Ž radal bass functon Ž RFB. kernel. Ž. KŽ x, x. tanh x T x, multlayer perceptron Ž MLP. kernel Ž 11. Ž. where d,,, and are constants. Notce that the Mercer condton holds for all and d values n the RBF and the polynomal cases, but not for all possble choces of and n the MLP case. The LS-SVM classfer s then constructed as follows: N Ý 1 yž x. sgn ykž x, x. b Ž 12. Note that the matrx n Ž 10. s of dmenson Ž N 1. Ž N 1.. For large values of N, ths matrx cannot easly be stored, such that an teratve soluton method for solvng t s needed. A Hestenes Stefel conjugate gradent algorthm s suggested n Ref. 34 to overcome ths problem. Bascally, the latter rests upon a Ž. 34 transformaton of the matrx n 10 to a postve defnte form. A straghtforward extenson of LS-SVMs to multclass problems has been proposed n Ref. 35, where addtonal outputs are taken to encode multclasses as s often done n classcal neural network methodology. 29 A drawback of LS-SVMs s that sparseness s lost due to the choce of a 2-norm. However, ths can be crcumvented n a second stage by a prunng procedure whch s based upon removng tranng ponts guded by the sorted support value spectrum. 36 B. Calbratng the RBF LS-SVM Classfer All classfers were traned usng RBF kernels. 3 Estmaton of the generalzaton ablty of the RBF LS-SVM classfer s then realzed by the followng expermental setup 3 : 3 1 Ž. 1 Set asde 4 of the data for the tranng set and the remanng 4 for testng, respectng the orgnal class dstrbuton. Ž. 2 Perform 10-fold cross valdaton on the tranng data for each Ž,. combnaton from the ntal canddate tunng sets and typcally chosen as follows: 0.5, 5, 10, 15, 25, 50, 100, 250, 5004 n 1 0.01, 0.5, 1, 10, 50, 100, 500, 10004 N '

KNOWLEDGE DISCOVERY IN DIRECT MARKETING 1031 The square root ' n of the number of nputs n s ntroduced, snce x x 2 2 n 1 the RBF kernel s proportonal to n and the factor N s ntroduced such that the msclassfcaton term Ý 1 N e 2 s normalzed wth the sze of the data set. Ž. 3 Choose optmal Ž,. from the ntal canddate tunng sets and by lookng at the best cross valdaton performance for each Ž,. combnaton. Ž. 4 Refne and teratvely by means of a grd search mechansm to further optmze the tunng parameters Ž,.. In our experments, we repeated ths step three tmes. Ž. 5 Construct the LS-SVM classfer usng the total tranng set for the optmal choce of the tuned hyperparameters Ž,.. Ž. 6 Assess the generalzaton ablty by means of the ndependent test set. Followng the procedure outlned above, one obtaned the results depcted 3 n Table III. The optmzed RBF LS-SVM classfer, traned on 4 of the data set, acheves a percentage correctly classfed on the tranng data of 77.54% wth 13.75 and 1.50. Performance on the ndependent test data amounts to 74.48% correctly classfed. We contrasted these results wth those obtaned usng a lnear kernel for the LS-SVM classfer. As can be observed from Table III, the percentage correctly classfed drops to 76.26% on the tranng set and to 73.76% on the ndependent test set. IV. THE INPUT SELECTION EXPERIMENT A. Input Selecton n a Nutshell Input selecton s a commonly adhered technque to reduce model complexty. The goal s to fnd a reduced coordnate system that allows one to project the data on a more compact representaton. The general assumpton underlyng ths operaton and justfyng t, s that the studed data approxmately le wthn the bounds of ths reduced space. As such, models wth fewer nputs are capable of mprovng both human understandng and computatonal performance. Moreover, elmnaton of redundant and or rrelevant nputs may also mprove the predctve power of an algorthm. 37 Selectng the best subset of a set of n predctors s a nontrval problem. Ths follows from the fact that the optmal nput subset can only be obtaned when the nput space s exhaustvely searched. When n nputs are present, ths would mply the need to evaluate 2 n 1 nput subsets. Unfortunately, as n grows, ths very quckly becomes computatonally nfeasble. For that reason, heurstc search procedures are often preferred. Input selecton can then ether be performed as a preprocessng step, ndepen- Table III. Classfcaton accuracy of the optmzed RFB LS-SVM classfer versus an LS-SVM optmzed usng a lnear kernel. LS-SVM LS-SVM Classfcaton Accuracy RBF Kernel Lnear Kernel Tranng Ž 3750 observatons. 77.54% 76.26% Test Ž 1250 observatons. 74.48% 73.76%

1032 VIAENE ET AL. dent of the nducton algorthm, or explctly make use of t. The former approach s termed flter, the latter wrapper. 38 Flter methods operate ndependently of the learnng algorthm. Undesrable nputs are fltered out of the data before nducton commences. Focus 39 and Relef 40 are well-known flter methods. Wrapper methods make use of the actual target learnng algorthm to evaluate the usefulness of nputs. Typcally, the nput evaluaton heurstc that s used s based upon nspecton of the traned parameters and or comparson of predctve performance under dfferent nput subset confguratons. Input selecton s then often performed n a sequental fashon, e.g., guded by a best-frst nput selecton strategy. The backward selecton scheme starts from a full nput set and stepwse prunes nput varables that are undesrable. The forward selecton scheme starts from the empty nput set and stepwse adds nput varables that are desrable. Hybrds of the above also exst. B. Wrappng the Optmzed LS-SVM Classfer Input selecton effectvely starts at the moment the LS-SVM classfer has been constructed on the full set of n avalable predctors. The nput selecton procedure s based upon a Ž greedy. best-frst heurstc, gudng a backward search mechansm through the nput space. 38 The mechancs of the mplemented heurstc for assessng the senstvty of the classfer to a certan nput are qute straghtforward. We apply a strategy of constant substtuton n whch an nput s perturbed to ts mean whle all other nputs keep ther values and compute the mpact of ths operaton on the performance of the obtaned LS-SVM classfer wthout reestmaton of the LS-SVM parameters and b. Ths assessment s done usng the separate prunng set to obtan an unbased estmate of the change n classfcaton accuracy of the constructed classfer. The prunng set conssts of 1250 observatons that were randomly selected from the tranng set of 3750 observatons. Fgure 1 provdes a concse overvew of the dfferent steps of the expermental procedure. Fgure 1.

KNOWLEDGE DISCOVERY IN DIRECT MARKETING 1033 Startng wth a full nput set F 1, all n nputs are pruned sequentally,.e., one by one. The frst nput fp to be removed, s determned at the end of Step 1 task Ž. 4. After havng removed ths nput from F 1, the reduced nput set F F f 4 2 1 p s used for subsequent nput removal. At ths moment, an teraton of dentcal Steps s started, n whch, n a frst phase, the LS-SVM parameters and b are re-estmated on the tranng set task Ž. 1 of Step, however, wthout recalbraton for and, and the generalzaton ablty of the classfer s quantfed on the ndependent test set task Ž. 2 of Step. Notce that the orgnally optmzed and values obtaned n task Ž. 1 of Step 1 reman unchanged durng the entre nput selecton phase. Agan, nput senstvtes of the resultng classfcaton model Ž wthout re-estmaton of and b. are assessed on the prunng set to dentfy the nput to whch the classfer s least senstve when perturbed to ts mean task Ž. 3 of Step. Ths nput s then pruned from the remanng nput subset and dsregarded for further analyss. The prunng procedure s thereupon resumed wth a reduced nput set, untl all nputs are eventually removed. Once all nputs have been pruned, the preferred reduced model s then determned by means of the hghest prunng set performance. Table IV summarzes the emprcal fndngs of the prunng procedure for the RFM case. Observe how the suggested nput selecton method allows sgnfcant reducton of model complexty Ž from 25 to 9 nputs. wthout any sgnfcant degradaton of the generalzaton behavor on the ndependent test set. The test set performance amounts to 73.92% for the full model and 73.52% for the reduced model. The order of nput removal as depcted n Table V, provdes further nsght nto the relatve mportance of the predctor categores Ž cf. Table II.. The reduced model conssts of the nne nputs that are underlned n Table V. Ths reduced set of predctors conssts of frequency, monetary, and other Ž non-rfm. varables. It s especally mportant to note that the reduced model ncludes nformaton on returned merchandse. Furthermore, notce the absence of the recency component n the reduced nput set. Inspecton of the order of removal of nputs, whle further prunng ths reduced nput set, hghlghts the relatve mportance of the frequency varables. More specfcally, the last two varables to be removed belong to ths predctor category. Note that an nput set consstng of only these two nputs, stll yelds a percentage correctly classfed at 72.00% on the test set. Results also pont to the benefcal effect of ncludng the Table IV. Emprcal assessment of the RBF LS-SVM classfers for the full and reduced models. Classfcaton Accuracy Full Model Reduced Model Tranng Ž 2500 observatons. 77.36% 76.04% Prunng Ž 1250 observatons. 76.72% 77.20% Test Ž 1250 observatons. 73.92% 73.52% Number of Inputs 25 9

1034 VIAENE ET AL. Table V. Order of nput removal. Each nput s qualfed by ts category wth r, f, m and o respectvely standng for recency, frequency, monetary and other Ž cf. Table II.. Prunng Steps 1 5 6 10 11 15 16 20 21 25 RetPerc o ProdclaM o RecHstN r FrYearN f MonYearR m lnž MonHstN. m MonHstR m IncrHst o lnž MonHstR. m MonYearN m RecHstR r IncrYear o RecYearR r MonHstN m GenInfo o Ndays o lnž MonYearR. m RecYearN r GenCust o FrHstR f ProdclaT o lnž MonYearN. m FrHstN f RetMerch o FrYearR f non-rfm customer proflng varables GenInfo and GenCust for mprovng predctve accuracy. They underlne that customer company nteracton varables, here measured by ndcators of nformaton requests and complants, provde addtonal predctve power to purchase ncdence modelng for database marketng. V. CONCLUSION In ths paper, we appled an LS-SVM based nput selecton wrapper to a real-lfe drect marketng case nvolvng the modelng of repeat-purchase behavor. Based on a thorough revew of the lterature, we extended the well-known recency, frequency, monetary Ž RFM. framework Ž 1. by usng alternatve operatonalzatons of the orgnal varables, and Ž. 2 by ncludng several addtonal behavoral varables. The senstvty based, stepwse nput selecton method, constructed as a wrapper around the LS-SVM classfer, allows sgnfcant reducton of model complexty wthout degradng predctve performance. The emprcal fndngs hghlght the role of frequency and monetary varables n the reduced model, whle the recency varable category seems to be of somewhat lesser mportance wthn the response model. Results also pont to the benefcal effect of ncludng non-rfm customer proflng varables for mprovng predctve accuracy. More specfcally, customer company nteracton, measured by ndcators of nformaton requests and complants, and merchandse returns provde addtonal predctve power to purchase ncdence modelng for database marketng. Ths work was partly carred out at the Leuven Insttute for Research on Informaton Systems Ž LIRIS. of the Department of Appled Economc Scences of the K.U. Leuven n the framework of the KBC Insurance Research Char. Ths work was partly carred out at the ESAT laboratory and the Interdscplnary Center of Neural Networks ICNN of the K.U. Leuven. S. Vaene, holder of the KBC Insurance Research Char, and B. Baesens are both Research Assstants of LIRIS. T. Van Gestel s a research assstant wth the Fund for Scentfc Research Flanders Ž FWO-Flanders., J. Suykens s a postdoctoral researcher wth the Fund for Scentfc Research Flanders Ž FWO-Flanders., B. De Moor s a senor research assocate wth the Fund for Scentfc Research Flanders Ž FWO-Flanders.. D. Van den Poel s an assstant professor at the Department of

KNOWLEDGE DISCOVERY IN DIRECT MARKETING 1035 Marketng at Ghent Unversty. J. Vanthenen and G. Dedene are Senor Research Assocates of LIRIS. References 1. Cullnan GJ. Pckng them by ther battng averages recency-frequency-monetary method of controllng crculaton. Drect Mal Marketng Assocaton, New York, manual release 2103 edton, 1977. 2. Suykens JAK, Vandewalle J. Least squares support vector machne classfers. Neural Process Lett 1999;9Ž. 3 :293 300. 3. Van Gestel T, Suykens JAK, Baesens B, Vaene S, Vanthenen J, Dedene G, De Moor B, Vandewalle J. Benchmarkng least squares support vector machne classfers. Techncal Report 00-37, ESAT-SISTA, K.U. Leuven, Leuven, Belgum, 2000. 4. Bauer A. A drect mal customer purchase model. J Drect Marketng 1988;2Ž. 3 :16 24. 5. Kestnbaum RD. Quanttatve database methods. In: The drect marketng handbook. New York: McGraw Hll; 1992. p 588 597. 6. Bult JR. Target selecton for drect marketng, PhD thess, Gronngen Unversty, 1993. 7. Levn N, Zahav J. Contnuous predctve modelng: a comparatve analyss. J Interactve Marketng 1998;12Ž. 2 :5 22. 8. Van der Scheer HR. Quanttatve approaches for proft maxmzaton n drect marketng, PhD thess, Gronngen Unversty, 1998. 9. Dekmpe MG, Degraeve Z. The attrton of volunteers. European J Operat Res 1997;98:37 51. 10. Van den Poel D, Leuns J. Database marketng modelng for fnancal servces usng hazard rate models. Internat Rev Retal, Dstrbuton and Consumer Res 1998; 8Ž. 2 :243 257. 11. Van den Poel D. Response Modelng for Database Marketng usng Bnary Classfcaton, PhD thess, K.U. Leuven, 1999. 12. Nash EL. Drect marketng: strategy, plannng, executon. Thrd Ed. New York: McGraw Hll; 1994. 13. Zahav J, Levn N. Issues and problems n applyng neural computng to target marketng. J Drect Marketng 1997;11Ž. 4 :63 75. 14. Hauser B. Lst Segmentaton. In: The drect marketng handbook. New York: McGraw-Hll; 1992. p 233 247. 15. DMA. Statstcal fact book 1998. Twenteth Ed. New York: Drect Marketng Assocaton; 1998. 16. Berger P, Maglozz T. The effect of sample sze and proporton of buyers n the sample on the performance of lst segmentaton equatons generated by regresson analyss. J Drect Marketng 1992;6Ž. 1 :13 22. 17. Btran GR, Mondschen SV. Malng decsons n the catalog sales ndustry. Manage Sc 1996;42Ž. 9 :1364 1381. 18. Bult JR, Wttnk DR. Estmatng and valdatng asymmetrc heterogeneous loss functons appled to health care fund rasng. Int J Res Marketng 1996;13:215 226. 19. Bult JR. Semparametrc versus parametrc classfcaton models: an applcaton to drect marketng. J Marketng Res 1993;30:380 390. 20. Bult JR, Van der Scheer H, Wansbeek T. Interacton between target and malng characterstcs n drect marketng, wth an applcaton to health care fund rasng. Int J Res Marketng 1997;14:301 308. 21. Gonul F, Sh MZ. Optmal malng of catalogs: a new methodology usng estmable structural dynamc programmng models. Manage Sc 1998;44Ž. 9 :1249 1262. 22. Kaslow GA. A mcroeconomc analyss of consumer response to drect marketng and mal order, PhD thess, Calforna Insttute of Technology, 1997.

1036 VIAENE ET AL. 23. Maglozz TL, Berger PD. Lst segmentaton strateges n drect marketng. Omega Int J Manage Sc 1993;21Ž. 1 :61 72. 24. Maglozz TL. An emprcal nvestgaton of regresson meta-strateges for drect marketng lst segmentaton models, PhD thess, Boston Unversty, 1989. 25. Thrasher RP. Cart: a recent advance n tree-structured lst segmentaton methodology. J Drect Marketng 1991;5Ž. 1 :35 47. 26. Stone B. Successful drect marketng methods. Chcago: Cran books; 1984. 27. Levn N, Zahav J. Segmentaton analyss wth manageral judgment. J Drect Marketng 1996;10Ž. 3 :28 47. 28. Lttle RJA. Regresson wth mssng x s: a revew. J Amer Statst Assoc 1992;87Ž 420.:1227 1230. 29. Bshop CM. Neural networks for pattern recognton. New York: Oxford Unversty Press; 1995. 30. Crstann N, Shawe-Taylor J. An ntroducton to support vector machnes. Cambrdge, UK: Cambrdge Unversty Press; 2000. 31. Smola A. Learnng wth kernels, PhD thess, Techncal Unversty, Berln, 1999. 32. Vapnk V. The nature of statstcal learnng theory. New York: Sprnger-Verlag; 1995. 33. Vapnk V. Statstcal learnng theory. New York: John Wley & Sons; 1998. 34. Suykens JAK, Lukas L, Van Dooren P, De Moor B, Vandewalle J. Least squares support vector machne classfers: a large scale algorthm. In: 14th European Conference on Crcuts Theory and Desgn, Stresa, Italy, 1999. p 839 842. 35. Suykens JAK, Vandewalle J. Multclass least squares support vector machnes. In: 10th Internatonal Jont Conference on Neural Networks, Washngton DC, 1999. 36. Suykens JAK, Lukas L, Vandewalle J. Sparse least squares support vector machne classfers. In: 9th European Symposum on Artfcal Neural Networks, Bruges, Belgum; 2000. p 37 42. 37. Bellman RE. Adaptve control processes. Prnceton: Prnceton Unversty Press; 1961. 38. John G, Kohav R, Pfleger K. Irrelevant features and the subset selecton problem. In: Machne Learnng: Proc Eleventh Int Conf, San Francsco, Calforna, 1994. p 121 129. 39. Almuallm H, Detterch TG. Learnng wth many rrelevant features. In: Nnth Nat Conf on Artfcal Intellgence, Anahem, Calforna. Menlo Park, CA: AAAI Press; 1991. p 547 552. 40. Kra K, Rendell LA. The feature selecton problem: Tradtonal methods and a new algorthm. In: Tenth Nat Conf on Artfcal Intellgence, San Jose, Calforna. Menlo Park, CA: AAAI Press; 1992. p 129 134.