Document de traval du LEM 2011-06 A PROBABILITY-MAPPIG ALGORITHM FOR CALIBRATIG THE POSTERIOR PROBABILITIES: A DIRECT MARKETIG APPLICATIO Krstof Coussement *, Wouter Bucknx ** * IESEG School of Management (LEM-CRS) ** Python Predctons, Brussels, Belgum
A Probablty-Mappng Algorthm for Calbratng the Posteror Probabltes: A Drect Marketng Applcaton* Krstof Coussement, Wouter Bucknx + IESEG School of Management (LEM-CRS), Department of Marketng, 3 Rue de la Dgue, F-59000, Llle (France). + Managng partner (PhD), Python Predctons, Avenue R. Van den Dressche 9, B-1150 Brussels, Belgum. Frst and correspondng author: Krstof Coussement, K.Coussement@eseg.fr, Tel.:+33320545892 Second author: Wouter Bucknx, Wouter.Bucknx@pythonpredctons.com Ths paper s accepted for publcaton n European Journal of Operatonal Research 1
A Probablty-Mappng Algorthm for Calbratng the Posteror Probabltes: A Drect Marketng Applcaton Abstract Calbraton refers to the adjustment of the posteror probabltes output by a classfcaton algorthm towards the true pror probablty dstrbuton of the target classes. Ths adjustment s necessary to account for the dfference n pror dstrbutons between the tranng set and the test set. Ths artcle proposes a new calbraton method, called the probablty-mappng approach. Two types of mappng are proposed: lnear and non-lnear probablty mappng. These new calbraton technques are appled to 9 real-lfe drect marketng datasets. The newly-proposed technques are compared wth the orgnal, non-calbrated posteror probabltes and the adjusted posteror probabltes obtaned usng the rescalng algorthm of Saerens, Latnne, & Decaestecker (2002). The results recommend that marketng researchers must calbrate the posteror probabltes obtaned from the classfer. Moreover, t s shown that usng a smple rescalng algorthm s not a frst and workable soluton, because the results suggest applyng the newly-proposed non-lnear probablty-mappng approach for best calbraton performance. Keywords: data mnng, drect marketng, response modelng, calbraton, decson support systems Ths paper s accepted for publcaton n European Journal of Operatonal Research 2
1. Introducton Due to recent developments n IT nfrastructure and the ever-ncreasng trust placed n complex computer systems, analysts are showng an ncreasng nterest n classfcaton modelng n a varety of dscplnes such as credt scorng (Martens et al., 2010; Paleologo et al., 2010), medcne (Confort & Gudo, 2010), text classfcaton (Boso & Rghn, 2007), SMEs fund management (Km and Sohn, 2010), revenue management (Morales & Wang, 2010), and so on. The same nterests are shared by the drect marketng communty. Drect marketng analysts have an ncreasng nterest n buldng predcton models that assgn a probablty of response to each and every ndvdual customer n the database (Lamb et al., 1994). The task of classfcaton s made even more nterestng by the fact that nowadays current marketng envronments store ncredble amounts of customer nformaton at a very low cost, ncludng soco-demographcs, transactonal buyng behavor, atttudnal data, etc. (ak et al., 2000), whle at the same tme there has been a tremendous ncrease n academc nterest n drect marketng applcatons (e.g. Allenby et al., 1999; Baumgartner & Hruschka, 2005; Hruschka, 2010; Lee et al., 2010; Persma & Jonker, 2004). Therefore response models are defned as classfcaton models that attempt to dscrmnate between responders and nonresponders on a certan company malng. In the past, purely statstcal methods lke logstc regresson, dscrmnant analyss and nave bayes models have been proposed to dscrmnate between responders and non-responders n a drect marketng context (Baesens et al., 2002; Bult, 1993; Dechmann et al., 2002). Although these technques may be very effectve, they make a strngent assumpton about the underlyng relatonshp between the ndependent varables and the dependent or response varable. In response to ths, more advanced data mnng algorthms lke decson treegeneratng technques, artfcal neural networks and support vector machnes have been appled (Baesens et al., 2002; Bose & Chen, 2009; Crone and et al., 2006; Haughton & Oulab, 1997; Zahav & Levn, 1997). All these bnary classfcaton models are used for two reasons. Frst, researchers rely on them to obtan robust parameter estmates of the ndependent varables by modelng the probablty of response as a functon of the ndependent varables. Second, these models are used to obtan consstent predcted probabltes of response, whch are then used () to rank the customers based on ther Ths paper s accepted for publcaton n European Journal of Operatonal Research 3
responsveness to the campagn, () to optmze the overall campagn strategy by offerng the customer the product wth the hghest response probablty over the dfferent response models and () for the dscrmnaton task of the response event tself where one classfes customers nto responders and non-responders. For () and (), the absolute sze of the posteror response probabltes s crucal. Ths study focuses on the process of obtanng correct response probabltes, where calbratng the posteror probabltes could have a postve mpact on the optmzaton of the overall campagn strategy and the effcency of the dscrmnaton task. In practce, a classfcaton model s bult on a tranng set,.e. a set of customers where both the ndependent varables and the dependent varable are present. In order to correctly measure the dscrmnaton power of the traned classfer, the classfcaton model s appled to a group of customers who have not been used for tranng, called the scorng or test set. The purpose s to obtan robust and consstent predctons for the response probablty of these unseen customers. As one s nterested to dvde the customers nto responders and nonresponders, a judcous classfcaton based on the posteror response probabltes of the customers s needed. In other words, customers havng a response probablty exceedng a certan threshold wll be classfed as responders and vce versa. However, t often happens that a classfer s traned usng a dataset that does not reflect the true pror probabltes of the target classes n the real-lfe populaton. Ths may have serous negatve consequences on the dscrmnaton performance because the posteror probabltes do not reflect the true probablty of nterest. Ths phenomenon occurs n a drect marketng context as well where the pror probabltes between the tranng set and the (out-of-sample) test set are sgnfcantly dfferent. More specfcally, the tranng set conssts of customers who are preselected by an earler response model as beng customers wth a hgh response probablty, whle the test set does not make any restrctons based on the customer profles n the database. In such a case, a large dscrepancy exsts between the response dstrbutons on the tranng set and the test set. The ncdence, whch s the percentage of responders n a data set, s much hgher n the tranng set as compared to the ncdence of real response n the outof-sample test set. Ths nconsstency has a negatve effect on the dscrmnaton performance on the test set, especally because the classfer s decson to classfy customers nto Ths paper s accepted for publcaton n European Journal of Operatonal Research 4
responders or non-responders s based on settng a threshold on the raw posteror probabltes of class membershp. For nstance, when a classfer s traned on a dataset wth a hgher ncdence than the one n the test set, the posteror probabltes on the test set are nflated. Thus makng a classfcaton decson based on the absolute value of the posteror probabltes may sgnfcantly harm the dscrmnaton performance. Moreover, optmzng the campagn strategy by offerng the product wth the hghest response probablty to the customer becomes useless because the response probabltes for dfferent products for a partcular customer are not comparable. Ths paper focuses on how researchers can adjust the posteror probabltes based on the true pror dstrbuton of the response varable. Ths process of adjustment s called calbraton. Ths paper proposes a new methodology to be used to calbrate the posteror probabltes from the test set wth the real-world stuaton, a process called probablty-mappng. It maps the posteror response probabltes obtaned from the classfer onto the pror dstrbuton of real response. The new probablty-mappng approaches usng generalzed lnear models and non-parametrc generalzed addtve models are compared wth the orgnal, non-calbrated posteror probabltes and the calbrated probabltes usng the rescalng methodology of Saerens et al. (2002). Ths paper s structured as follows. Secton 2 descrbes the methodologcal framework, whle Secton 3 explores the dfferent calbraton approaches (rescalng approaches and probabltymappng approaches). Secton 4 explans the characterstcs of emprcal valdaton, whle Secton 5 explores the results. Secton 6 gves manageral recommendatons, and fnally Secton 7 concludes ths paper. 2. Methodologcal framework Fgure 1 shows the methodologcal framework for the dfferent calbraton methods appled n ths study. Ths paper s accepted for publcaton n European Journal of Operatonal Research 5
[ISERT FIGURE 1 OVER HERE] Defne a tranng set TRAI M (x, y ) m 1 consstng of m customers. Each customer ( x, y ) s a combnaton of an nput vector x representng the ndependent varables and a dependent varable on a certan malng. y wth y 0, 1 correspondng to whether or not a customer responded TRAI M conssts of all customers who were selected by a prevous response model, thus receved a drect malng to buy the product, and therefore ndcated as customers havng a hgh response probablty. Durng the tranng phase, a classfer C maps the nput vector space onto the bnary response varable usng the tranng set observatons. For the test set (x ) n 1 consstng of n customers, the traned classfer C s appled and for every customer n a response probablty P org s obtaned. The purpose of ths paper s to adjust the posteror probabltes P org to the real response dstrbuton because the traned sample TRAI s not representatve for whch corresponds to the true M populaton. Therefore for every observaton ( x ) n and summarzed n n (y ) 1, the real response s collected REAL wth y 0, 1 correspondng to whether or not the customer spontaneously bought that partcular product n a tme wndow wthout drect malng actons. The real response represents a response of pure nterest n the product. In other words, REAL s used to represent the true pror probabltes. The purpose of the calbraton phase s to adjust P org, the non-calbrated posteror probabltes of, n order to truly represent the probablty of response. Wth the am of methodologcally benchmarkng the dfferent calbraton methods, a k-fold cross-valdaton s appled. In a k-fold cross-valdaton, the dataset s randomly splt nto k equal parts of whch one after the other s used durng the scorng phase; whle the other k-1 parts are used for tranng the calbraton model. ote that ( REAL ) represents the k-fold for ( REAL ), whle P korg represents the non-calbrated posteror probabltes of. 3. Calbraton approaches Ths paper s accepted for publcaton n European Journal of Operatonal Research 6
Two types of calbraton methods are appled: () the rescalng algorthm of Saerens et al. (2002) and () the newly-proposed probablty-mappng approaches. The former algorthm rescales P korg the posteror probabltes of REAL takng nto account the real ncdence of (Saerens et al., 2002), whle the latter type adjusts the posteror probabltes of by mappng them onto the real responses of REAL. 3.1 Rescalng algorthm (SAERES) Ths secton explans the methodology of Saerens et al. (2002). The startng pont of the Saerens et al. (2002) calbraton approach s based on Bayes rule,.e. the posteror probabltes of response depend n a non-lnear way on the pror probablty dstrbuton of the target classes. The pror probablty dstrbuton of the target class s defned as the ncdence of the target class, or n ths settng the percentage of responders n the dataset. Therefore, a change n the pror probablty dstrbuton of the target classes changes the posteror response probabltes of the classfcaton model. Saerens et al. (2002) descrbe a process that adjusts the posteror probabltes of response output by the classfer to the new pror probablty dstrbuton of the target classes makng use of a predefned rescalng formula. In detal, the calbrated posteror probabltes of response for the customers n the test set of fold k are obtaned by weghtng the non-calbrated posteror probabltes, P korg, by the rato of the response ncdence of REAL,.e. the new pror probablty dstrbuton, to the response ncdence n the tranng set,.e. the old pror probablty dstrbuton. The denomnator s a scalng factor to make sure that the calbrated posteror probabltes sum up to one. In summary, P knew Pk ( c1 ) P Pkt ( c1 ) Pk ( c0 ) (1 Pkorg ) P ( c ) kt 0 korg Pk ( c1 ) P P ( c ) kt 1 korg (1) wth P knew representng the calbrated posteror response probabltes n fold k, P k (c ) and P kt (c ) the new and old pror probabltes for class wth 0, 1. A data set EW s Ths paper s accepted for publcaton n European Journal of Operatonal Research 7
obtaned whch contans P knew, the calbrated posteror probabltes for the test data of. 3.2 Probablty-mappng approaches The purpose of the probablty-mappng approaches s to map P korg, the old posteror probabltes of, onto the real response probabltes of REAL. As such, one s able to buld a classfcaton model that maps the non-calbrated probabltes onto the real response probabltes. Ths model s then used to calbrate the old probabltes wth the corrected probabltes of response. However, the real probablty dstrbuton of the target classes s not drectly avalable from REAL whch only contans the real responses y wth y 0,1 on an ndvdual customer level. In order to convert the real responses y wth y 0,1 on an ndvdual level n REAL nto a real response probablty dstrbuton, a number of bns b are constructed. The ncdence of response s calculated per bn and equals the percentage of real response. Ths ncdence s used as an approxmaton for the real probablty of response per bn. In practce, both and REAL are splt nto a number of bns b usng the equal frequency bnnng approach based on the posteror probabltes of. kb ( REAL kb ) represents the b-th bn n the k-fold of ( REAL respectvely). kb and REAL kb logcally contan dentcal observatons, whle P kborg s the non-calbrated posteror probablty average for the b-th bn n and P kbreal s the percentage of real responders n the b-th bn of REAL. P kbreal serves as a proxy for the true pror probablty. In order to formalze the relatonshp between the average posteror probabltes of and the approxmate real probabltes obtaned from REAL, a formal mappng s obtaned usng the bnned tranng set of fold k by P kbreal = f k (P kborg ) (2) wth f k beng the classfer that maps the non-calbrated posteror probabltes onto the real probabltes n fold k. After the classfer f k s bult, t s appled to the unseen test data of Ths paper s accepted for publcaton n European Journal of Operatonal Research 8
to obtan the new posteror probabltes, P knew, for every ndvdual n the test data set of the k-th fold. A new data set s obtaned posteror probabltes. EW whch contans P knew, the calbrated There are several possbltes for f k, a functon that lnks the estmated, non-calbrated probltes of kb to the approxmated real probabltes of REAL kb. Ths study uses one probablty-mappng approach based on generalzed lnear models (Secton 3.2.1.) and three non-lnear approaches; one based on generalzed lnear models wth log-transformed noncalbrated probabltes (Secton 3.2.2.) and two approaches based on generalzed addtve models (Secton 3.2.3. and Secton 3.2.4.). 3.2.1 Generalzed lnear model (GLM) Gven y as the dependent varable wth y 0, 1 representng P kbreal, the averaged true pror probabltes from REAL kb and x equal to P kborg, the averaged posteror probabltes of, a generalzed lnear model wth logt lnk functon s employed to model f k (x ) 0, 1. kb Moreover, t assumes that the relatonshp between P kborg and P kbreal s lnear n the log-odds va y logt y log α k β kx (3) 1 y or -1 y f k (x ) = logt (α β x ) k k (4) wth α k as the ntercept andβ kx as the predctor. The parameters α k andβ k are estmated usng maxmum lkelhood (Tabachnck & Fdell, 1996). 3.2.2. Generalzed lnear model wth log transformaton (LOG) Ths paper s accepted for publcaton n European Journal of Operatonal Research 9
Another approach s to log-transform x n equaton (3) and equaton (4), because as such one captures the non-lnearty n the log-odds space between y, P kbreal the true pror probabltes from REAL kb, and x, P kborg the posteror probabltes of kb. 3.2.3. Generalzed addtve models An attractve alternatve to standard generalzed lnear models s generalzed addtve models (Haste & Tbshran, 1986, 1987, 1990). Generalzed addtve models relax the lnearty constrant and apply a non-parametrc non-lnear ft to the data. In other words, the data themselves decde on the functonal form between the ndependent varable and the dependent varable. Defne y as the dependent varable wth y 0, 1 representng P kbreal, the true posteror probabltes from REAL kb, and x equals to P kborg, the posteror probabltes of. To model f k (x ) 0, 1, generalzed addtve models wth logt lnk functon are kb employed. Methodologcally, generalzed addtve models generalze the generalzed lnear model prncple by replacng the lnear predctor component where β kx n equaton (4) wth an addtve -1 y f k (x ) = logt (α s (x )) k k (5) wth s k (x ) as a smooth functon. Ths study uses penalzed regresson splnes s (x ) k to estmate the non-parametrc trend for the dependency of y on x (Wahba, 1990; Green and Slverman, 1994). These smooth functons use a large number of knots leadng to a model qute nsenstve to the knot locatons, whle the penalty term s used to avod the danger of over-fttng that would otherwse accompany the use of many knots. The complexty of the model s controlled by a parameter λ and t s nversely related to the degrees of freedom (df). If λ s small (.e. the df are large), a very complex model that closely matches the data s employed. When λ s large (.e. the df are small), a smooth model s consdered. In order to optmze the generalzed addtve model, the fttng amounts to penalzed lkelhood maxmzaton by penalzed teratvely reweghted least squares (Wood, 2000; 2004; 2008). Ths paper s accepted for publcaton n European Journal of Operatonal Research 10
3.2.4. Generalzed addtve models wth monotoncty constrant Due to the fact that generalzed addtve models produce a non-lnear relatonshp between the ndependent varable P kborg and the dependent varable P kbreal, the orgnal rankng of the posteror probabltes of analysts could argue that the mappng from rankng of the customers n and ts calbrated verson may change. However, marketng TRAI M onto and the correspondng (and respectvely ) gven by the ntal classfer C should be conserved. As such a non-decreasng monotoncty constrant on the generalzed addtve models predctons s ntroduced to retan the orgnal rankng of the customers. Inspred by rule-set creaton advances n the post-learnng phase (e.g. pedagogcal rule-based extracton technques as employed n Martens et al. (2007)), a rule set on the tranng set of fold k s produced n the post-estmaton phase of the generalzed addtve models to obtan a functon f k, a non-decreasng monotone functon. Ths ensures that the ntal rankng of P kborg s mantaned n the correspondng predctons P kbreal of fold k. Practcally, the tranng set s sorted by P kborg. Afterwards the rule-based algorthm detects all non-decreasng monotonc nconsstences on the predcton values f k (P kborg ) on the tranng set. For nstance, suppose that the predcton value for bn X+1 s lower than the predcton value for bn X than the rulebased algorthm adds a rule to the rule-base to change the predcton value of bn X+1 to the larger predcton value of bn X. In the end, the generalzed addtve model and the rule-base descrbe a non-decreasng monotone generalzed addtve model based functon f k wth followng characterstcs (Denlnger, 2010) f P kborgx P kborgx+1 => f k (P kborgx ) f k (P kborgx+1 ) (7) wth P kborgx and P kborgx+1 orgnal non-calbrated posteror probabltes for bns X and X+1 n the tranng data set, and f k (P kborgx ) and f k (P kborgx+1 ) the calbrated posteror probabltes n fold k for bns X and X+1. Ths paper s accepted for publcaton n European Journal of Operatonal Research 11
4. Emprcal valdaton The calbraton methods are employed on a test bed of 9 real-lfe drect marketng datasets provded by a large European fnancal nsttuton. Each of these datasets corresponds to a typcal fnancal product. Table 1 shows the characterstcs of the response datasets. [ISERT TABLE 1 OVER HERE] Wth the am of methodologcally comparng the dfferent algorthms, a 10-fold crossvaldaton s appled. Furthermore, the classfer C whch lnks TRAI M and and outputs P org s a logstc regresson wth forward varable selecton as t s a robust and wellknown classfcaton technque n the marketng envronment (esln et al., 2006). Moreover, the calbraton approaches based on generalzed addtve models use dfferent levels of degrees of freedom (df) representng the non-lnearty of the model. The hgher the df, the hgher the non-lnearty. On the hand, the df are set manually by the researcher (userspecfed), whle on the other hand the df are smultaneously estmated n correspondence wth the shape of the response functon (automatc). Ths study opts to manually set the df equal to {3,4,5} (resultng n GAMdf and GAMdf MOO). Ths df range s nspred by the recommendaton and the applcatons n Haste & Tbshran (1990) and Haste et al. (2001) that use a relatvely small number of df to account for dfferent levels of non-lnearty. Addtonally, the generalzed cross-valdaton procedure (GCV) s employed to automatcally select the deal number of df, resultng n GAMgcv and GAMgcv MOO (Gu & Wahba, 1991; Wood, 2000; 2004). The number of bns b for and REAL s set to 200. Furthermore, P org, the non-calbrated posteror probabltes of, are used as a benchmark (ORIGIAL). The dfferent algorthms are compared on an ndvdual customer level usng the log-lkelhood (LL) by LL ln( y 1 y p(x ) 1 p(x ) ) y ln p(x ) (1 y )ln 1 p(x ) 1 1 (8) wth the number of customers, p(x ) equal to P knew, the calbrated posteror response probablty, and y as the real response varable wth y 0, 1. The LL s a well-known Ths paper s accepted for publcaton n European Journal of Operatonal Research 12
metrc n (drect) marketng to evaluate the performance of an algorthm (e.g. Baumgartner & Hruschka, 2005). The hgher the LL, the better the calbraton of the posteror probabltes to the true response dstrbuton s. Moreover, the non-parametrc Fredman test (Demšar, 2006; Fredman, 1937, 1940) wth the Bonferron-Dunn test (Dunn, 1961) s used n order to sgnfcantly compare the dfferent approaches wth the best performng algorthm. 5. Results Table 2 represents the 10-fold cross-valdated log-lkelhood values for the dfferent datasets and the dfferent algorthms. Three panels (a,b,c) are ncluded representng the varous levels of the user-selected degrees of freedom for the generalzed addtve model mappngs. For each dataset, the best performng algorthm n terms of log-lkelhood s put n talcs. Moreover, the average rankng (AR) per algorthm over the dfferent datasets s gven. The lower the rankng, the better the algorthm s shown to be. The best performng algorthm s underlned and set n bold, whle the algorthms that are not sgnfcantly dfferent to the best one at a 5% sgnfcance level are only set n bold. [ISERT TABLE 2 OVER HERE] The algorthms are splt nto 4 categores; the orgnal, non-calbrated posteror probabltes (ORIGIAL), the rescalng methodology (SAERES), the lnear probablty-mappng approach (GLM) and the non-lnear probablty-mappng approaches (LOG, GAMdf, GAMdf MOO, GAMgcv and GAMgcv MOO). Table 2 reveals that calbratng the posteror probabltes has a benefcal mpact when a dscrepancy exsts between the true pror probabltes of the tranng set and the test set: ORIGIAL always performs worse than the other calbraton approaches. Ths paper s accepted for publcaton n European Journal of Operatonal Research 13
Comparng the rescalng approach (SAERES) wth the best performng calbraton approaches, one concludes that SAERES always sgnfcantly performs less well than the non-lnear probablty-mappng approaches, whle SAERES performs better than the lnear probablty-mappng approach (GLM). These results show that the analyst better shfts towards a non-lnear probablty-mappng approach, despte the fact that SAERES s an easy and workable soluton to the calbraton problem. Contrastng the varous probablty-mappng approaches, Table 2 dscloses that the non-lnear calbraton approaches (LOG, GAMdf, GAMdf MOO, GAMgcv and GAMgcv MOO) are always amongst the best performng algorthms. The lnear mappng approach (GLM) s never sgnfcantly compettve wth one of ts non-lnear counterparts. However, the generalzed lnear model wth log-transformaton (LOG) s compettve to the more advanced GAM approaches (GAMdf, GAMdf MOO, GAMgcv and GAMgcv MOO). Wthn the nonlnear calbraton settng, one concludes that GAMgcv MOO always performs best, followed by the other non-lnear calbraton approaches. Table 3 contans the performance measures for all generalzed addtve models approaches (GAMdf, GAMdf MOO, GAMgcv and GAMgcv MOO), for all the levels of degrees of freedom. On a dataset level, the best performng algorthm s put n talcs. Furthermore, the average rankng (AR) for each algorthm s gven and the best performng algorthm (.e. the one wth the lowest rankng) s underlned and set n bold, whle the ones that are not sgnfcantly dfferent to the best at a 5% sgnfcance level are smply put n bold. [ISERT TABLE 3 OVER HERE] Table 3 reveals that GAM5 MOO s the best performng algorthm amongst the GAM and GAM MOO approaches, quckly followed by GAMgcv MOO. Table 3 shows a better performance trend for the GAM approaches when the number of df are ncreased. GAM3 performs less well than GAM4, whle GAM4 has a less well performance than GAM5. Furthermore, t s clear that ncludng the monotoncty constrant has a benefcal mpact on Ths paper s accepted for publcaton n European Journal of Operatonal Research 14
the calbraton performance of the GAM approaches. The average rankng of the GAM approaches ncludng the monotoncty constrant s always better than ther orgnal GAM counterparts (.e. GAMdf versus GAMdf MOO and GAMgcv versus GAMgcv MOO). Moreover, the automatc smoothness parameter selecton procedure proves ts benefcal mpact. For the non-monotoncty models, GAMgcv has always a better rankng than the GAMdf approaches. For the monotoncty models, GAMgcv MOO performs always better than GAM3 MOO and GAM4 MOO, whle GAMgcv MOO s very compettve to GAM5 MOO. 6. Dscusson The results suggest that marketng analysts should calbrate the posteror probabltes when the tranng set does not represent the true pror dstrbuton. In general, calbratng the posteror probabltes s more benefcal than usng the non-calbrated posteror probabltes. Moreover, t s shown that a smple rescalng algorthm (SAERES) that takes nto account the rato of the old and the new prors s not suffcent to be a frst and workable soluton to ntally solve the calbraton problem. SAERES always performs sgnfcantly worse than the more complex non-lnear probablty-mappng approaches. Furthermore, marketng researchers should better not apply the lnear probablty-mappng approach n ths specfc settng. Indeed, amongst the dfferent probablty-mappng approaches, t has been shown that non-lnear approaches are preferable over the lnear mappngs. The LOG approach s compettve to the more complex GAM-based calbraton approaches, and because t s based on the common generalzed lnear model framework, LOG could be seen as a frst and workable approach. However f one s nterested to optmze the calbraton performance, the GAM-based approaches are preferable. Moreover, one concludes that usng the automatc smoothng parameter selecton procedure and mposng a monotoncty constrant on the GAM method are the most preferred optons to be employed n GAM models n order to optmze calbraton performance. 7. Concluson Drect marketng receves consderable attenton these days n academa as well as n busness due to a serous drop n the cost of IT equpment and the ever ncreasng usage of response Ths paper s accepted for publcaton n European Journal of Operatonal Research 15
models n a varety of busness settngs. In a drect marketng context, a dscrepancy sometmes exsts between the pror dstrbutons on the tranng set and scorng set whch s problematc. Ths may happen due to the fact that the tranng set conssts entrely of customers prevously selected by a response model, and thus ths dataset conssts of a hgher percentage of responders. Applyng a classfcaton model bult on ths tranng set to the complete set of customers wll harm the estmaton of the response probabltes. Thoroughly adjustng the posteror probabltes to the real response probablty dstrbuton wll mprove the classfcaton performance. Ths study reveals that the non-lnear probablty-mappng approaches are amongst the best performng algorthms and ther usage s hghly recommended n a day to day busness settng for followng reasons. Frstly, the non-lnear probablty-mappng approaches delver a better performance compared to the other calbraton algorthms ncluded n ths research paper. Ths leads to the fact that the calbrated probabltes better reflect the true probabltes of response. Secondly, there s a possblty to vsualze the relatonshp between P kborg and P kbreal. Ths gves managers a better and vsual understandng of the calbraton process for a partcular settng. For nstance, the more the calbraton curve s away from the 45 degree lne (,.e. the lne where P kborg =P kbreal or no calbraton s necessary), the hgher the added value of sendng a leaflet because the ncdence n TRAI M s hgher than n REAL. Fnally, the underlyng technques lke generalzed lnear models and generalzed addtve models are easly mplementable n today s busness envronment due to the avalablty of the classfers n tradtonal software packages lke SAS and R. Whlst we are confdent that our study adds sgnfcant value to the lterature, valuable drectons for future research are dentfed. Besde the probablty-mappng approaches whch map the P kborg onto the P kbreal, an extensve research project could be dedcated to nvestgate the mpact of ntegrated calbraton approaches,.e. methods that ntegrate the calbraton process nto the ntal tranng phase of classfer C n order to come up wth a new classfer C whch drectly outputs calbrated probabltes. For nstance, a workable ntegrated calbraton approach could be represented by a two-stage Bayesan logstc regresson approach that drectly outputs calbrated posteror probabltes. In order to obtan ths ntegrated Bayesan calbraton model, the followng procedure s proposed. Under the assumpton that the commonly-used pror dstrbuton for β k s multvarate Gaussan,.e. p( β k )~( β 0, 0 ), the Bayesan emprcal approach could be used to specfy the values of Ths paper s accepted for publcaton n European Journal of Operatonal Research 16
β 0 and 0 by fttng a Bayesan logstc regresson to TRAI km usng non-nformatve prors. Consequently, the resultng posteror mean vector and varance-covarance matrx of ths ntal model could then be used for the values of β 0 and 0 for the second Bayesan logstc regresson on REAL. The resultng ntegrated Bayesan logstc regresson approach C wll drectly output adapted, calbrated posteror probabltes 1. Furthermore, the probabltymappng approaches are valdated n a drect marketng settng, whereas future research efforts could be spent to nvestgate the external valdty to other operatonal research settngs. Acknowledgements The authors would lke to thank the anonymous company for freely dstrbutng the datasets. We would lke to thank our frendly and journal revewers for ther frutful comments on earler versons of ths paper and the edtor, Jesus Artalejo, for gudng ths paper through the revewng process. 1 evertheless, ths approach s not tested n the current verson of the paper for confdentalty reasons. Ths paper s accepted for publcaton n European Journal of Operatonal Research 17
References Allenby, G. M., Leone, R. P., & Jen, L. C. (1999). A dynamc model of purchase tmng wth applcaton to drect marketng. Journal of the Amercan Statstcal Assocaton, 94, 365-374. Baesens, B., Vaene, S., Van den Poel, D., Vanthenen, J., & Dedene, G. (2002). Bayesan neural network learnng for repaeat purchase modelng n drect marketng. European Journal of Operatonal Research, 138, 191-211. Baumgartner, B., & Hruschka, H. (2005). Allocaton of catalogs to collectve customers based on semparametrc response models. European Journal of Operatonal Research, 162, 839-849. Bose, I., & Chen, X. (2009). Quanttatve models for drect marketng: A revew from systems perspectve. European Journal of Operatonal Research, 195, 1-16. Boso, S., & Rghn, G. (2007). Computatonal approaches to a combnatoral optmzaton problem arsng from text classfcaton. Computers & Operatons Research, 34, 1910-1928. Bult, J. R. (1993). Semparametrc Versus Parametrc Classfcaton Models: An Applcaton to Drect Marketng. Journal of Marketng Research, 30, 380-390. Confort, D., & Gudo, R. (2010). Kernel based support vector machne va semdefnte programmng: Applcaton to medcal dagnoss. Computers & Operatons Research, 37, 1389-1394. Crone, S. F., Lessmann, S., & Stahlbock, R. (2006). The mpact of preprocessng on data mnng: An evaluaton of classfer senstvty n drect marketng. European Journal of Operatonal Research, 173, 781-800. Dechmann, J., Eshgh, A., Haughton, D., Sayek, S., & Teebagy,. (2002). Applcaton of multple adaptve regresson splnes (MARS) n drect response modelng. Journal of Interactve Marketng, 16, 15-27. Ths paper s accepted for publcaton n European Journal of Operatonal Research 18
Demšar, J. (2006). Statstcal comparsons of classfers over multple data sets. Journal of Machne Learnng Research, 7, 1-30. Denlnger, C.G. (2010). Elements of real analyss. Jones and Bartlett Publshers. Dunn, O. J. (1961). Multple comparsons among means. Journal of the Amercan Statstcal Assocaton, 56, 52-64. Fredman, M. (1937). The use of ranks to avod the assumpton of normalty mplct n the analyss of varance. Journal of the Amercan Statstcal Assocaton, 32, 675-701. Fredman, M. (1940). A comparson of alternatve tests of sgnfcance for the problem of m rankngs. The Annals of Mathematcal Statstcs, 11, 86-92. Green, P.J. & Slverman, B.W. (1994). onparametrc regresson and generalzed lnear models. Chapman and Hall/CRC Press. Gu, C., & Wahba, G. (1991). Mnmzng GCV/GML scores wth multple smoothng parameters va the ewton method. SIAM Journal of Scentfc and Statstcal Computng, 12, 383-398. Haste, T., & Tbshran, R. (1986). Generalzed addtve models. Statstcal Scence, 1, 297-318. Haste, T., & Tbshran, R. (1987). Generalzed Addtve Models: Some applcatons. Journal of the Amercan Statstcal Assocaton, 82, 371-386. Haste, T., & Tbshran, R. (1990). Generalzed Addtve Models. London: Chapman and Hall. Haste, T., Tbshran, R., & Fredman, J. (2001). The Elements of Statstcal Learnng: Data Mnng, Inference and Predcton. ew York: Sprnger-Verlag. Haughton, D., & Oulab, S. (1997). Drect marketng modelng wth CART and CHAID. Journal of Drect Marketng, 11, 42-52. Ths paper s accepted for publcaton n European Journal of Operatonal Research 19
Hruschka, H. (2010). Consderng endogenety for optmal catalog allocaton n drect marketng. European Journal of Operatonal Research, 206, 239-247. Km, H.S., & Sohn, S.Y. (2010). Support vector machnes for default predcton of SMEs based on technology credt. European Journal of Operatonal Research, 201, 838-846. Lamb, C. W., Har, J. F., & McDanel, C. (1994). Prncples of marketng (second ed.). Cncnnat: Soulh-Westem Publshng Co. Lee, H. J., Shn, H., Hwang, S. S., Cho, S., & MacLachlan, D. (2010). Sem-Supervsed Response Modelng. Journal of Interactve Marketng, 24, 42-54. Martens, D., Baesens, B., Van Gestel, T., & Vanthenen, J. (2007). Comprehensble credt scorng models usng rule extracton from support vector machnes. European Journal of Operatonal Research, 183, 1466-1476. Martens, D., Van Gestel, T., De Backer, M., Haesen, R., Vanthenen, J., & Baesens, B. (2010). Credt ratng predcton usng Ant Colony Optmzaton. Journal of the Operatonal Research Socety, 61, 561-573. Morales, D. R., & Wang, J. B. (2010). Forecastng cancellaton rates for servces bookng revenue management usng data mnng. European Journal of Operatonal Research, 202, 554-562. ak, P. A., Hagerty, M. R., & Tsa, C. L. (2000). A new dmenson reducton approach for data-rch marketng envronments: Slced nverse regresson. Journal of Marketng Research, 37, 88-101. esln, S. A., Gupta, S., Kamakura, W., Lu, J. X., & Mason, C. H. (2006). Defecton detecton: Measurng and understandng the predctve accuracy of customer churn models. Journal of Marketng Research, 43, 204-211. Paleologo, G., Elsseeff, A., & Antonn, G. (2010). Subaggng for credt scorng models. European Journal of Operatonal Research, 201, 490-499. Ths paper s accepted for publcaton n European Journal of Operatonal Research 20
Persma,., & Jonker, J.J. (2004). Determnng the optmal drect malng frequency. European Journal of Operatonal Research, 158, 173-182. Saerens, M., Latnne, P., & Decaestecker, C. (2002). Adjustng the outputs of a classfer to new a pror probabltes: A smple procedure. eural Computaton, 14, 21-41. Tabachnck, B. G. & Fdell, L. S. (1996). Usng multvarate statstcs. HarperCollngs Publshers, ew York. Wahba, G. (1990). Splne models for observatonal data. Socety for Industral and Appled Mathematcs (SIAM) Captal Cty Press, Montpeler (Vermont). Wood, S.. (2000). Modellng and Smoothng Parameter Estmaton wth Multple Quadratc Penaltes. Journal of the Royal Statstcal Socety B, 62, 413-428. Wood, S.. (2004). Stable and effcent multple smoothng parameter estmaton for generalzed addtve models. Journal of the Amercan Statstcal Assocaton, 99, 673-686. Wood, S.. (2008). Fast stable drect fttng and smoothness selecton for generalzed addtve models. Journal of the Royal Statstcal Socety B, 70, 495-518. Zahav, J., & Levn,. (1997). Applyng neural computng to target marketng. Journal of Drect Marketng, 11, 76-93. Ths paper s accepted for publcaton n European Journal of Operatonal Research 21
LI LOG GAM GAM MOO TRAI M C k = 1 to 10 k1 k2 k3.. kb ORIGIAL SAERES EW EW EW k1 EW k2 EW k3.. EW EW 1 EW 2 EW 3.. EW REAL REAL REAL k1 REAL k2 REAL k3.. REAL kb Fgure 1: Methodologcal framework. Ths paper s accepted for publcaton n European Journal of Operatonal Research 22
Dataset ID TRAI M # customers % responders # customers % responders # varables used by C 1 70,463 1.29% 119,329 0.18% 10 2 56,301 2.40% 119,104 0.44% 16 3 23,328 7.57% 117,7433 0.14% 19 4 9,027 11.94% 305,567 0.57% 12 5 14,946 17.11% 1,073,346 0.18% 22 6 14,586 5.04% 1,223,703 0.05% 11 7 25,660 3.10% 748,602 0.18% 14 8 12,603 0.56% 127,651 0.24% 10 9 19,190 0.95% 113,496 0.23% 18 Table 1: Dataset characterstcs. Ths paper s accepted for publcaton n European Journal of Operatonal Research 23
Panel a RESCALIG PROBABILITY-MAPPIG LIEAR O-LIEAR DATASET GAM3 GAMgcv ORIGIAL SAERES GLM LOG GAM3 GAMgcv MOO MOO 1-242.91-179.07-202.76-178.02-177.88-180.22-177.58-177.40 2-479.55-306.81-323.14-304.73-306.83-304.23-307.03-303.60 3-998.78-1280.32-1064.30-980.99-982.94-980.74-981.08-979.81 4-223.14-206.79-223.00-206.29-206.69-207.16-206.56-206.61 5-243.69-140.90-246.53-140.36-142.71-145.56-140.35-139.96 6-9884.39-1192.41-1189.09-1173.78-1165.40-1163.68-1165.18-1163.46 7-3823.20-1032.46-1025.02-1016.21-1017.24-1016.52-1016.53-1016.04 8-17802.90-1290.20-1297.41-1294.45-1291.47-1290.74-1292.57-1291.86 9-5493.35-510.03-525.81-506.11-523.01-515.27-507.20-505.91 AR 7.67 5.00 6.89 3.22 4.44 3.78 3.44 1.56 Panel b RESCALIG PROBABILITY-MAPPIG LIEAR O-LIEAR DATASET GAM4 GAMgcv ORIGIAL SAERES GLM LOG GAM4 GAMgcv MOO MOO 1-242.91-179.07-202.76-178.02-178.06-180.22-177.40-177.40 2-479.55-306.81-323.14-304.73-305.68-304.23-305.68-303.60 3-998.78-1280.32-1064.30-980.99-983.81-980.74-980.17-979.81 4-223.14-206.79-223.00-206.29-206.48-207.16-206.37-206.61 5-243.69-140.90-246.53-140.36-146.48-145.56-140.02-139.96 6-9884.39-1192.41-1189.09-1173.78-1164.36-1163.68-1164.06-1163.46 7-3823.20-1032.46-1025.02-1016.21-1016.80-1016.52-1016.27-1016.04 8-17802.90-1290.20-1297.41-1294.45-1290.92-1290.74-1292.02-1291.86 9-5493.35-510.03-525.81-506.11-522.89-515.27-507.05-505.91 AR 7.66 5.22 6.88 3.22 4.55 3.88 2.77 1.77 Panel c RESCALIG PROBABILITY-MAPPIG LIEAR O-LIEAR DATASET GAM5 GAMgcv ORIGIAL SAERES GLM LOG GAM5 GAMgcv MOO MOO 1-242.91-179.07-202.76-178.02-178.70-180.22-177.38-177.40 2-479.55-306.81-323.14-304.73-305.46-304.23-304.86-303.60 3-998.78-1280.32-1064.30-980.99-982.74-980.74-979.81-979.81 4-223.14-206.79-223.00-206.29-206.52-207.16-206.33-206.61 5-243.69-140.90-246.53-140.36-149.81-145.56-139.94-139.96 6-9884.39-1192.41-1189.09-1173.78-1163.91-1163.68-1163.62-1163.46 7-3823.20-1032.46-1025.02-1016.21-1016.59-1016.52-1016.11-1016.04 8-17802.90-1290.20-1297.41-1294.45-1290.75-1290.74-1291.86-1291.86 9-5493.35-510.03-525.81-506.11-522.79-515.27-506.63-505.91 AR 7.66 5.11 6.88 3.33 4.66 4.00 2.33 1.88 * 10-fold CV LL values, AR = average rankng Table 2: The 10-fold cross-valdated log-lkelhood values. Panel a: overvew wth GAM3 & GAM3 MOO. Panel b: overvew wth GAM4 & GAM4 MOO. Panel c: overvew wth GAM5 & GAM5 MOO. Ths paper s accepted for publcaton n European Journal of Operatonal Research 24
O-LIEAR DATASET GAM3 GAM4 GAM5 GAMgcv GAM3 GAM4 GAM5 GAMgcv MOO MOO MOO MOO 1-177.88-177.58-178.06-177.40-178.70-177.38-180.22-177.40 2-306.83-307.03-305.68-305.68-305.46-304.86-304.23-303.60 3-982.94-981.08-983.81-980.17-982.74-979.81-980.74-979.81 4-206.69-206.56-206.48-206.37-206.52-206.33-207.16-206.61 5-142.71-140.35-146.48-140.02-149.81-139.94-145.56-139.96 6-1165.40-1165.18-1164.36-1164.06-1163.91-1163.62-1163.68-1163.46 7-1017.24-1016.53-1016.80-1016.27-1016.59-1016.11-1016.52-1016.04 8-1291.47-1292.57-1290.92-1292.02-1290.75-1291.86-1290.74-1291.86 9-523.01-507.20-522.89-507.05-522.79-506.63-515.27-505.91 AR 6.55 5.55 5.88 3.66 5.22 2.11 4.55 2.33 * 10-fold CV LL values, AR = average rankng Table 3: The 10-fold cross-valdate log-lkelhood values for GAM and GAM MOO calbraton models. Ths paper s accepted for publcaton n European Journal of Operatonal Research 25