A PROBABILITY-MAPPING ALGORITHM FOR CALIBRATING THE POSTERIOR PROBABILITIES: A DIRECT MARKETING APPLICATION
|
|
|
- Bethanie Grant
- 9 years ago
- Views:
Transcription
1 Document de traval du LEM A PROBABILITY-MAPPIG ALGORITHM FOR CALIBRATIG THE POSTERIOR PROBABILITIES: A DIRECT MARKETIG APPLICATIO Krstof Coussement *, Wouter Bucknx ** * IESEG School of Management (LEM-CRS) ** Python Predctons, Brussels, Belgum
2 A Probablty-Mappng Algorthm for Calbratng the Posteror Probabltes: A Drect Marketng Applcaton* Krstof Coussement, Wouter Bucknx + IESEG School of Management (LEM-CRS), Department of Marketng, 3 Rue de la Dgue, F-59000, Llle (France). + Managng partner (PhD), Python Predctons, Avenue R. Van den Dressche 9, B-1150 Brussels, Belgum. Frst and correspondng author: Krstof Coussement, [email protected], Tel.: Second author: Wouter Bucknx, [email protected] Ths paper s accepted for publcaton n European Journal of Operatonal Research 1
3 A Probablty-Mappng Algorthm for Calbratng the Posteror Probabltes: A Drect Marketng Applcaton Abstract Calbraton refers to the adjustment of the posteror probabltes output by a classfcaton algorthm towards the true pror probablty dstrbuton of the target classes. Ths adjustment s necessary to account for the dfference n pror dstrbutons between the tranng set and the test set. Ths artcle proposes a new calbraton method, called the probablty-mappng approach. Two types of mappng are proposed: lnear and non-lnear probablty mappng. These new calbraton technques are appled to 9 real-lfe drect marketng datasets. The newly-proposed technques are compared wth the orgnal, non-calbrated posteror probabltes and the adjusted posteror probabltes obtaned usng the rescalng algorthm of Saerens, Latnne, & Decaestecker (2002). The results recommend that marketng researchers must calbrate the posteror probabltes obtaned from the classfer. Moreover, t s shown that usng a smple rescalng algorthm s not a frst and workable soluton, because the results suggest applyng the newly-proposed non-lnear probablty-mappng approach for best calbraton performance. Keywords: data mnng, drect marketng, response modelng, calbraton, decson support systems Ths paper s accepted for publcaton n European Journal of Operatonal Research 2
4 1. Introducton Due to recent developments n IT nfrastructure and the ever-ncreasng trust placed n complex computer systems, analysts are showng an ncreasng nterest n classfcaton modelng n a varety of dscplnes such as credt scorng (Martens et al., 2010; Paleologo et al., 2010), medcne (Confort & Gudo, 2010), text classfcaton (Boso & Rghn, 2007), SMEs fund management (Km and Sohn, 2010), revenue management (Morales & Wang, 2010), and so on. The same nterests are shared by the drect marketng communty. Drect marketng analysts have an ncreasng nterest n buldng predcton models that assgn a probablty of response to each and every ndvdual customer n the database (Lamb et al., 1994). The task of classfcaton s made even more nterestng by the fact that nowadays current marketng envronments store ncredble amounts of customer nformaton at a very low cost, ncludng soco-demographcs, transactonal buyng behavor, atttudnal data, etc. (ak et al., 2000), whle at the same tme there has been a tremendous ncrease n academc nterest n drect marketng applcatons (e.g. Allenby et al., 1999; Baumgartner & Hruschka, 2005; Hruschka, 2010; Lee et al., 2010; Persma & Jonker, 2004). Therefore response models are defned as classfcaton models that attempt to dscrmnate between responders and nonresponders on a certan company malng. In the past, purely statstcal methods lke logstc regresson, dscrmnant analyss and nave bayes models have been proposed to dscrmnate between responders and non-responders n a drect marketng context (Baesens et al., 2002; Bult, 1993; Dechmann et al., 2002). Although these technques may be very effectve, they make a strngent assumpton about the underlyng relatonshp between the ndependent varables and the dependent or response varable. In response to ths, more advanced data mnng algorthms lke decson treegeneratng technques, artfcal neural networks and support vector machnes have been appled (Baesens et al., 2002; Bose & Chen, 2009; Crone and et al., 2006; Haughton & Oulab, 1997; Zahav & Levn, 1997). All these bnary classfcaton models are used for two reasons. Frst, researchers rely on them to obtan robust parameter estmates of the ndependent varables by modelng the probablty of response as a functon of the ndependent varables. Second, these models are used to obtan consstent predcted probabltes of response, whch are then used () to rank the customers based on ther Ths paper s accepted for publcaton n European Journal of Operatonal Research 3
5 responsveness to the campagn, () to optmze the overall campagn strategy by offerng the customer the product wth the hghest response probablty over the dfferent response models and () for the dscrmnaton task of the response event tself where one classfes customers nto responders and non-responders. For () and (), the absolute sze of the posteror response probabltes s crucal. Ths study focuses on the process of obtanng correct response probabltes, where calbratng the posteror probabltes could have a postve mpact on the optmzaton of the overall campagn strategy and the effcency of the dscrmnaton task. In practce, a classfcaton model s bult on a tranng set,.e. a set of customers where both the ndependent varables and the dependent varable are present. In order to correctly measure the dscrmnaton power of the traned classfer, the classfcaton model s appled to a group of customers who have not been used for tranng, called the scorng or test set. The purpose s to obtan robust and consstent predctons for the response probablty of these unseen customers. As one s nterested to dvde the customers nto responders and nonresponders, a judcous classfcaton based on the posteror response probabltes of the customers s needed. In other words, customers havng a response probablty exceedng a certan threshold wll be classfed as responders and vce versa. However, t often happens that a classfer s traned usng a dataset that does not reflect the true pror probabltes of the target classes n the real-lfe populaton. Ths may have serous negatve consequences on the dscrmnaton performance because the posteror probabltes do not reflect the true probablty of nterest. Ths phenomenon occurs n a drect marketng context as well where the pror probabltes between the tranng set and the (out-of-sample) test set are sgnfcantly dfferent. More specfcally, the tranng set conssts of customers who are preselected by an earler response model as beng customers wth a hgh response probablty, whle the test set does not make any restrctons based on the customer profles n the database. In such a case, a large dscrepancy exsts between the response dstrbutons on the tranng set and the test set. The ncdence, whch s the percentage of responders n a data set, s much hgher n the tranng set as compared to the ncdence of real response n the outof-sample test set. Ths nconsstency has a negatve effect on the dscrmnaton performance on the test set, especally because the classfer s decson to classfy customers nto Ths paper s accepted for publcaton n European Journal of Operatonal Research 4
6 responders or non-responders s based on settng a threshold on the raw posteror probabltes of class membershp. For nstance, when a classfer s traned on a dataset wth a hgher ncdence than the one n the test set, the posteror probabltes on the test set are nflated. Thus makng a classfcaton decson based on the absolute value of the posteror probabltes may sgnfcantly harm the dscrmnaton performance. Moreover, optmzng the campagn strategy by offerng the product wth the hghest response probablty to the customer becomes useless because the response probabltes for dfferent products for a partcular customer are not comparable. Ths paper focuses on how researchers can adjust the posteror probabltes based on the true pror dstrbuton of the response varable. Ths process of adjustment s called calbraton. Ths paper proposes a new methodology to be used to calbrate the posteror probabltes from the test set wth the real-world stuaton, a process called probablty-mappng. It maps the posteror response probabltes obtaned from the classfer onto the pror dstrbuton of real response. The new probablty-mappng approaches usng generalzed lnear models and non-parametrc generalzed addtve models are compared wth the orgnal, non-calbrated posteror probabltes and the calbrated probabltes usng the rescalng methodology of Saerens et al. (2002). Ths paper s structured as follows. Secton 2 descrbes the methodologcal framework, whle Secton 3 explores the dfferent calbraton approaches (rescalng approaches and probabltymappng approaches). Secton 4 explans the characterstcs of emprcal valdaton, whle Secton 5 explores the results. Secton 6 gves manageral recommendatons, and fnally Secton 7 concludes ths paper. 2. Methodologcal framework Fgure 1 shows the methodologcal framework for the dfferent calbraton methods appled n ths study. Ths paper s accepted for publcaton n European Journal of Operatonal Research 5
7 [ISERT FIGURE 1 OVER HERE] Defne a tranng set TRAI M (x, y ) m 1 consstng of m customers. Each customer ( x, y ) s a combnaton of an nput vector x representng the ndependent varables and a dependent varable on a certan malng. y wth y 0, 1 correspondng to whether or not a customer responded TRAI M conssts of all customers who were selected by a prevous response model, thus receved a drect malng to buy the product, and therefore ndcated as customers havng a hgh response probablty. Durng the tranng phase, a classfer C maps the nput vector space onto the bnary response varable usng the tranng set observatons. For the test set (x ) n 1 consstng of n customers, the traned classfer C s appled and for every customer n a response probablty P org s obtaned. The purpose of ths paper s to adjust the posteror probabltes P org to the real response dstrbuton because the traned sample TRAI s not representatve for whch corresponds to the true M populaton. Therefore for every observaton ( x ) n and summarzed n n (y ) 1, the real response s collected REAL wth y 0, 1 correspondng to whether or not the customer spontaneously bought that partcular product n a tme wndow wthout drect malng actons. The real response represents a response of pure nterest n the product. In other words, REAL s used to represent the true pror probabltes. The purpose of the calbraton phase s to adjust P org, the non-calbrated posteror probabltes of, n order to truly represent the probablty of response. Wth the am of methodologcally benchmarkng the dfferent calbraton methods, a k-fold cross-valdaton s appled. In a k-fold cross-valdaton, the dataset s randomly splt nto k equal parts of whch one after the other s used durng the scorng phase; whle the other k-1 parts are used for tranng the calbraton model. ote that ( REAL ) represents the k-fold for ( REAL ), whle P korg represents the non-calbrated posteror probabltes of. 3. Calbraton approaches Ths paper s accepted for publcaton n European Journal of Operatonal Research 6
8 Two types of calbraton methods are appled: () the rescalng algorthm of Saerens et al. (2002) and () the newly-proposed probablty-mappng approaches. The former algorthm rescales P korg the posteror probabltes of REAL takng nto account the real ncdence of (Saerens et al., 2002), whle the latter type adjusts the posteror probabltes of by mappng them onto the real responses of REAL. 3.1 Rescalng algorthm (SAERES) Ths secton explans the methodology of Saerens et al. (2002). The startng pont of the Saerens et al. (2002) calbraton approach s based on Bayes rule,.e. the posteror probabltes of response depend n a non-lnear way on the pror probablty dstrbuton of the target classes. The pror probablty dstrbuton of the target class s defned as the ncdence of the target class, or n ths settng the percentage of responders n the dataset. Therefore, a change n the pror probablty dstrbuton of the target classes changes the posteror response probabltes of the classfcaton model. Saerens et al. (2002) descrbe a process that adjusts the posteror probabltes of response output by the classfer to the new pror probablty dstrbuton of the target classes makng use of a predefned rescalng formula. In detal, the calbrated posteror probabltes of response for the customers n the test set of fold k are obtaned by weghtng the non-calbrated posteror probabltes, P korg, by the rato of the response ncdence of REAL,.e. the new pror probablty dstrbuton, to the response ncdence n the tranng set,.e. the old pror probablty dstrbuton. The denomnator s a scalng factor to make sure that the calbrated posteror probabltes sum up to one. In summary, P knew Pk ( c1 ) P Pkt ( c1 ) Pk ( c0 ) (1 Pkorg ) P ( c ) kt 0 korg Pk ( c1 ) P P ( c ) kt 1 korg (1) wth P knew representng the calbrated posteror response probabltes n fold k, P k (c ) and P kt (c ) the new and old pror probabltes for class wth 0, 1. A data set EW s Ths paper s accepted for publcaton n European Journal of Operatonal Research 7
9 obtaned whch contans P knew, the calbrated posteror probabltes for the test data of. 3.2 Probablty-mappng approaches The purpose of the probablty-mappng approaches s to map P korg, the old posteror probabltes of, onto the real response probabltes of REAL. As such, one s able to buld a classfcaton model that maps the non-calbrated probabltes onto the real response probabltes. Ths model s then used to calbrate the old probabltes wth the corrected probabltes of response. However, the real probablty dstrbuton of the target classes s not drectly avalable from REAL whch only contans the real responses y wth y 0,1 on an ndvdual customer level. In order to convert the real responses y wth y 0,1 on an ndvdual level n REAL nto a real response probablty dstrbuton, a number of bns b are constructed. The ncdence of response s calculated per bn and equals the percentage of real response. Ths ncdence s used as an approxmaton for the real probablty of response per bn. In practce, both and REAL are splt nto a number of bns b usng the equal frequency bnnng approach based on the posteror probabltes of. kb ( REAL kb ) represents the b-th bn n the k-fold of ( REAL respectvely). kb and REAL kb logcally contan dentcal observatons, whle P kborg s the non-calbrated posteror probablty average for the b-th bn n and P kbreal s the percentage of real responders n the b-th bn of REAL. P kbreal serves as a proxy for the true pror probablty. In order to formalze the relatonshp between the average posteror probabltes of and the approxmate real probabltes obtaned from REAL, a formal mappng s obtaned usng the bnned tranng set of fold k by P kbreal = f k (P kborg ) (2) wth f k beng the classfer that maps the non-calbrated posteror probabltes onto the real probabltes n fold k. After the classfer f k s bult, t s appled to the unseen test data of Ths paper s accepted for publcaton n European Journal of Operatonal Research 8
10 to obtan the new posteror probabltes, P knew, for every ndvdual n the test data set of the k-th fold. A new data set s obtaned posteror probabltes. EW whch contans P knew, the calbrated There are several possbltes for f k, a functon that lnks the estmated, non-calbrated probltes of kb to the approxmated real probabltes of REAL kb. Ths study uses one probablty-mappng approach based on generalzed lnear models (Secton ) and three non-lnear approaches; one based on generalzed lnear models wth log-transformed noncalbrated probabltes (Secton ) and two approaches based on generalzed addtve models (Secton and Secton ) Generalzed lnear model (GLM) Gven y as the dependent varable wth y 0, 1 representng P kbreal, the averaged true pror probabltes from REAL kb and x equal to P kborg, the averaged posteror probabltes of, a generalzed lnear model wth logt lnk functon s employed to model f k (x ) 0, 1. kb Moreover, t assumes that the relatonshp between P kborg and P kbreal s lnear n the log-odds va y logt y log α k β kx (3) 1 y or -1 y f k (x ) = logt (α β x ) k k (4) wth α k as the ntercept andβ kx as the predctor. The parameters α k andβ k are estmated usng maxmum lkelhood (Tabachnck & Fdell, 1996) Generalzed lnear model wth log transformaton (LOG) Ths paper s accepted for publcaton n European Journal of Operatonal Research 9
11 Another approach s to log-transform x n equaton (3) and equaton (4), because as such one captures the non-lnearty n the log-odds space between y, P kbreal the true pror probabltes from REAL kb, and x, P kborg the posteror probabltes of kb Generalzed addtve models An attractve alternatve to standard generalzed lnear models s generalzed addtve models (Haste & Tbshran, 1986, 1987, 1990). Generalzed addtve models relax the lnearty constrant and apply a non-parametrc non-lnear ft to the data. In other words, the data themselves decde on the functonal form between the ndependent varable and the dependent varable. Defne y as the dependent varable wth y 0, 1 representng P kbreal, the true posteror probabltes from REAL kb, and x equals to P kborg, the posteror probabltes of. To model f k (x ) 0, 1, generalzed addtve models wth logt lnk functon are kb employed. Methodologcally, generalzed addtve models generalze the generalzed lnear model prncple by replacng the lnear predctor component where β kx n equaton (4) wth an addtve -1 y f k (x ) = logt (α s (x )) k k (5) wth s k (x ) as a smooth functon. Ths study uses penalzed regresson splnes s (x ) k to estmate the non-parametrc trend for the dependency of y on x (Wahba, 1990; Green and Slverman, 1994). These smooth functons use a large number of knots leadng to a model qute nsenstve to the knot locatons, whle the penalty term s used to avod the danger of over-fttng that would otherwse accompany the use of many knots. The complexty of the model s controlled by a parameter λ and t s nversely related to the degrees of freedom (df). If λ s small (.e. the df are large), a very complex model that closely matches the data s employed. When λ s large (.e. the df are small), a smooth model s consdered. In order to optmze the generalzed addtve model, the fttng amounts to penalzed lkelhood maxmzaton by penalzed teratvely reweghted least squares (Wood, 2000; 2004; 2008). Ths paper s accepted for publcaton n European Journal of Operatonal Research 10
12 Generalzed addtve models wth monotoncty constrant Due to the fact that generalzed addtve models produce a non-lnear relatonshp between the ndependent varable P kborg and the dependent varable P kbreal, the orgnal rankng of the posteror probabltes of analysts could argue that the mappng from rankng of the customers n and ts calbrated verson may change. However, marketng TRAI M onto and the correspondng (and respectvely ) gven by the ntal classfer C should be conserved. As such a non-decreasng monotoncty constrant on the generalzed addtve models predctons s ntroduced to retan the orgnal rankng of the customers. Inspred by rule-set creaton advances n the post-learnng phase (e.g. pedagogcal rule-based extracton technques as employed n Martens et al. (2007)), a rule set on the tranng set of fold k s produced n the post-estmaton phase of the generalzed addtve models to obtan a functon f k, a non-decreasng monotone functon. Ths ensures that the ntal rankng of P kborg s mantaned n the correspondng predctons P kbreal of fold k. Practcally, the tranng set s sorted by P kborg. Afterwards the rule-based algorthm detects all non-decreasng monotonc nconsstences on the predcton values f k (P kborg ) on the tranng set. For nstance, suppose that the predcton value for bn X+1 s lower than the predcton value for bn X than the rulebased algorthm adds a rule to the rule-base to change the predcton value of bn X+1 to the larger predcton value of bn X. In the end, the generalzed addtve model and the rule-base descrbe a non-decreasng monotone generalzed addtve model based functon f k wth followng characterstcs (Denlnger, 2010) f P kborgx P kborgx+1 => f k (P kborgx ) f k (P kborgx+1 ) (7) wth P kborgx and P kborgx+1 orgnal non-calbrated posteror probabltes for bns X and X+1 n the tranng data set, and f k (P kborgx ) and f k (P kborgx+1 ) the calbrated posteror probabltes n fold k for bns X and X+1. Ths paper s accepted for publcaton n European Journal of Operatonal Research 11
13 4. Emprcal valdaton The calbraton methods are employed on a test bed of 9 real-lfe drect marketng datasets provded by a large European fnancal nsttuton. Each of these datasets corresponds to a typcal fnancal product. Table 1 shows the characterstcs of the response datasets. [ISERT TABLE 1 OVER HERE] Wth the am of methodologcally comparng the dfferent algorthms, a 10-fold crossvaldaton s appled. Furthermore, the classfer C whch lnks TRAI M and and outputs P org s a logstc regresson wth forward varable selecton as t s a robust and wellknown classfcaton technque n the marketng envronment (esln et al., 2006). Moreover, the calbraton approaches based on generalzed addtve models use dfferent levels of degrees of freedom (df) representng the non-lnearty of the model. The hgher the df, the hgher the non-lnearty. On the hand, the df are set manually by the researcher (userspecfed), whle on the other hand the df are smultaneously estmated n correspondence wth the shape of the response functon (automatc). Ths study opts to manually set the df equal to {3,4,5} (resultng n GAMdf and GAMdf MOO). Ths df range s nspred by the recommendaton and the applcatons n Haste & Tbshran (1990) and Haste et al. (2001) that use a relatvely small number of df to account for dfferent levels of non-lnearty. Addtonally, the generalzed cross-valdaton procedure (GCV) s employed to automatcally select the deal number of df, resultng n GAMgcv and GAMgcv MOO (Gu & Wahba, 1991; Wood, 2000; 2004). The number of bns b for and REAL s set to 200. Furthermore, P org, the non-calbrated posteror probabltes of, are used as a benchmark (ORIGIAL). The dfferent algorthms are compared on an ndvdual customer level usng the log-lkelhood (LL) by LL ln( y 1 y p(x ) 1 p(x ) ) y ln p(x ) (1 y )ln 1 p(x ) 1 1 (8) wth the number of customers, p(x ) equal to P knew, the calbrated posteror response probablty, and y as the real response varable wth y 0, 1. The LL s a well-known Ths paper s accepted for publcaton n European Journal of Operatonal Research 12
14 metrc n (drect) marketng to evaluate the performance of an algorthm (e.g. Baumgartner & Hruschka, 2005). The hgher the LL, the better the calbraton of the posteror probabltes to the true response dstrbuton s. Moreover, the non-parametrc Fredman test (Demšar, 2006; Fredman, 1937, 1940) wth the Bonferron-Dunn test (Dunn, 1961) s used n order to sgnfcantly compare the dfferent approaches wth the best performng algorthm. 5. Results Table 2 represents the 10-fold cross-valdated log-lkelhood values for the dfferent datasets and the dfferent algorthms. Three panels (a,b,c) are ncluded representng the varous levels of the user-selected degrees of freedom for the generalzed addtve model mappngs. For each dataset, the best performng algorthm n terms of log-lkelhood s put n talcs. Moreover, the average rankng (AR) per algorthm over the dfferent datasets s gven. The lower the rankng, the better the algorthm s shown to be. The best performng algorthm s underlned and set n bold, whle the algorthms that are not sgnfcantly dfferent to the best one at a 5% sgnfcance level are only set n bold. [ISERT TABLE 2 OVER HERE] The algorthms are splt nto 4 categores; the orgnal, non-calbrated posteror probabltes (ORIGIAL), the rescalng methodology (SAERES), the lnear probablty-mappng approach (GLM) and the non-lnear probablty-mappng approaches (LOG, GAMdf, GAMdf MOO, GAMgcv and GAMgcv MOO). Table 2 reveals that calbratng the posteror probabltes has a benefcal mpact when a dscrepancy exsts between the true pror probabltes of the tranng set and the test set: ORIGIAL always performs worse than the other calbraton approaches. Ths paper s accepted for publcaton n European Journal of Operatonal Research 13
15 Comparng the rescalng approach (SAERES) wth the best performng calbraton approaches, one concludes that SAERES always sgnfcantly performs less well than the non-lnear probablty-mappng approaches, whle SAERES performs better than the lnear probablty-mappng approach (GLM). These results show that the analyst better shfts towards a non-lnear probablty-mappng approach, despte the fact that SAERES s an easy and workable soluton to the calbraton problem. Contrastng the varous probablty-mappng approaches, Table 2 dscloses that the non-lnear calbraton approaches (LOG, GAMdf, GAMdf MOO, GAMgcv and GAMgcv MOO) are always amongst the best performng algorthms. The lnear mappng approach (GLM) s never sgnfcantly compettve wth one of ts non-lnear counterparts. However, the generalzed lnear model wth log-transformaton (LOG) s compettve to the more advanced GAM approaches (GAMdf, GAMdf MOO, GAMgcv and GAMgcv MOO). Wthn the nonlnear calbraton settng, one concludes that GAMgcv MOO always performs best, followed by the other non-lnear calbraton approaches. Table 3 contans the performance measures for all generalzed addtve models approaches (GAMdf, GAMdf MOO, GAMgcv and GAMgcv MOO), for all the levels of degrees of freedom. On a dataset level, the best performng algorthm s put n talcs. Furthermore, the average rankng (AR) for each algorthm s gven and the best performng algorthm (.e. the one wth the lowest rankng) s underlned and set n bold, whle the ones that are not sgnfcantly dfferent to the best at a 5% sgnfcance level are smply put n bold. [ISERT TABLE 3 OVER HERE] Table 3 reveals that GAM5 MOO s the best performng algorthm amongst the GAM and GAM MOO approaches, quckly followed by GAMgcv MOO. Table 3 shows a better performance trend for the GAM approaches when the number of df are ncreased. GAM3 performs less well than GAM4, whle GAM4 has a less well performance than GAM5. Furthermore, t s clear that ncludng the monotoncty constrant has a benefcal mpact on Ths paper s accepted for publcaton n European Journal of Operatonal Research 14
16 the calbraton performance of the GAM approaches. The average rankng of the GAM approaches ncludng the monotoncty constrant s always better than ther orgnal GAM counterparts (.e. GAMdf versus GAMdf MOO and GAMgcv versus GAMgcv MOO). Moreover, the automatc smoothness parameter selecton procedure proves ts benefcal mpact. For the non-monotoncty models, GAMgcv has always a better rankng than the GAMdf approaches. For the monotoncty models, GAMgcv MOO performs always better than GAM3 MOO and GAM4 MOO, whle GAMgcv MOO s very compettve to GAM5 MOO. 6. Dscusson The results suggest that marketng analysts should calbrate the posteror probabltes when the tranng set does not represent the true pror dstrbuton. In general, calbratng the posteror probabltes s more benefcal than usng the non-calbrated posteror probabltes. Moreover, t s shown that a smple rescalng algorthm (SAERES) that takes nto account the rato of the old and the new prors s not suffcent to be a frst and workable soluton to ntally solve the calbraton problem. SAERES always performs sgnfcantly worse than the more complex non-lnear probablty-mappng approaches. Furthermore, marketng researchers should better not apply the lnear probablty-mappng approach n ths specfc settng. Indeed, amongst the dfferent probablty-mappng approaches, t has been shown that non-lnear approaches are preferable over the lnear mappngs. The LOG approach s compettve to the more complex GAM-based calbraton approaches, and because t s based on the common generalzed lnear model framework, LOG could be seen as a frst and workable approach. However f one s nterested to optmze the calbraton performance, the GAM-based approaches are preferable. Moreover, one concludes that usng the automatc smoothng parameter selecton procedure and mposng a monotoncty constrant on the GAM method are the most preferred optons to be employed n GAM models n order to optmze calbraton performance. 7. Concluson Drect marketng receves consderable attenton these days n academa as well as n busness due to a serous drop n the cost of IT equpment and the ever ncreasng usage of response Ths paper s accepted for publcaton n European Journal of Operatonal Research 15
17 models n a varety of busness settngs. In a drect marketng context, a dscrepancy sometmes exsts between the pror dstrbutons on the tranng set and scorng set whch s problematc. Ths may happen due to the fact that the tranng set conssts entrely of customers prevously selected by a response model, and thus ths dataset conssts of a hgher percentage of responders. Applyng a classfcaton model bult on ths tranng set to the complete set of customers wll harm the estmaton of the response probabltes. Thoroughly adjustng the posteror probabltes to the real response probablty dstrbuton wll mprove the classfcaton performance. Ths study reveals that the non-lnear probablty-mappng approaches are amongst the best performng algorthms and ther usage s hghly recommended n a day to day busness settng for followng reasons. Frstly, the non-lnear probablty-mappng approaches delver a better performance compared to the other calbraton algorthms ncluded n ths research paper. Ths leads to the fact that the calbrated probabltes better reflect the true probabltes of response. Secondly, there s a possblty to vsualze the relatonshp between P kborg and P kbreal. Ths gves managers a better and vsual understandng of the calbraton process for a partcular settng. For nstance, the more the calbraton curve s away from the 45 degree lne (,.e. the lne where P kborg =P kbreal or no calbraton s necessary), the hgher the added value of sendng a leaflet because the ncdence n TRAI M s hgher than n REAL. Fnally, the underlyng technques lke generalzed lnear models and generalzed addtve models are easly mplementable n today s busness envronment due to the avalablty of the classfers n tradtonal software packages lke SAS and R. Whlst we are confdent that our study adds sgnfcant value to the lterature, valuable drectons for future research are dentfed. Besde the probablty-mappng approaches whch map the P kborg onto the P kbreal, an extensve research project could be dedcated to nvestgate the mpact of ntegrated calbraton approaches,.e. methods that ntegrate the calbraton process nto the ntal tranng phase of classfer C n order to come up wth a new classfer C whch drectly outputs calbrated probabltes. For nstance, a workable ntegrated calbraton approach could be represented by a two-stage Bayesan logstc regresson approach that drectly outputs calbrated posteror probabltes. In order to obtan ths ntegrated Bayesan calbraton model, the followng procedure s proposed. Under the assumpton that the commonly-used pror dstrbuton for β k s multvarate Gaussan,.e. p( β k )~( β 0, 0 ), the Bayesan emprcal approach could be used to specfy the values of Ths paper s accepted for publcaton n European Journal of Operatonal Research 16
18 β 0 and 0 by fttng a Bayesan logstc regresson to TRAI km usng non-nformatve prors. Consequently, the resultng posteror mean vector and varance-covarance matrx of ths ntal model could then be used for the values of β 0 and 0 for the second Bayesan logstc regresson on REAL. The resultng ntegrated Bayesan logstc regresson approach C wll drectly output adapted, calbrated posteror probabltes 1. Furthermore, the probabltymappng approaches are valdated n a drect marketng settng, whereas future research efforts could be spent to nvestgate the external valdty to other operatonal research settngs. Acknowledgements The authors would lke to thank the anonymous company for freely dstrbutng the datasets. We would lke to thank our frendly and journal revewers for ther frutful comments on earler versons of ths paper and the edtor, Jesus Artalejo, for gudng ths paper through the revewng process. 1 evertheless, ths approach s not tested n the current verson of the paper for confdentalty reasons. Ths paper s accepted for publcaton n European Journal of Operatonal Research 17
19 References Allenby, G. M., Leone, R. P., & Jen, L. C. (1999). A dynamc model of purchase tmng wth applcaton to drect marketng. Journal of the Amercan Statstcal Assocaton, 94, Baesens, B., Vaene, S., Van den Poel, D., Vanthenen, J., & Dedene, G. (2002). Bayesan neural network learnng for repaeat purchase modelng n drect marketng. European Journal of Operatonal Research, 138, Baumgartner, B., & Hruschka, H. (2005). Allocaton of catalogs to collectve customers based on semparametrc response models. European Journal of Operatonal Research, 162, Bose, I., & Chen, X. (2009). Quanttatve models for drect marketng: A revew from systems perspectve. European Journal of Operatonal Research, 195, Boso, S., & Rghn, G. (2007). Computatonal approaches to a combnatoral optmzaton problem arsng from text classfcaton. Computers & Operatons Research, 34, Bult, J. R. (1993). Semparametrc Versus Parametrc Classfcaton Models: An Applcaton to Drect Marketng. Journal of Marketng Research, 30, Confort, D., & Gudo, R. (2010). Kernel based support vector machne va semdefnte programmng: Applcaton to medcal dagnoss. Computers & Operatons Research, 37, Crone, S. F., Lessmann, S., & Stahlbock, R. (2006). The mpact of preprocessng on data mnng: An evaluaton of classfer senstvty n drect marketng. European Journal of Operatonal Research, 173, Dechmann, J., Eshgh, A., Haughton, D., Sayek, S., & Teebagy,. (2002). Applcaton of multple adaptve regresson splnes (MARS) n drect response modelng. Journal of Interactve Marketng, 16, Ths paper s accepted for publcaton n European Journal of Operatonal Research 18
20 Demšar, J. (2006). Statstcal comparsons of classfers over multple data sets. Journal of Machne Learnng Research, 7, Denlnger, C.G. (2010). Elements of real analyss. Jones and Bartlett Publshers. Dunn, O. J. (1961). Multple comparsons among means. Journal of the Amercan Statstcal Assocaton, 56, Fredman, M. (1937). The use of ranks to avod the assumpton of normalty mplct n the analyss of varance. Journal of the Amercan Statstcal Assocaton, 32, Fredman, M. (1940). A comparson of alternatve tests of sgnfcance for the problem of m rankngs. The Annals of Mathematcal Statstcs, 11, Green, P.J. & Slverman, B.W. (1994). onparametrc regresson and generalzed lnear models. Chapman and Hall/CRC Press. Gu, C., & Wahba, G. (1991). Mnmzng GCV/GML scores wth multple smoothng parameters va the ewton method. SIAM Journal of Scentfc and Statstcal Computng, 12, Haste, T., & Tbshran, R. (1986). Generalzed addtve models. Statstcal Scence, 1, Haste, T., & Tbshran, R. (1987). Generalzed Addtve Models: Some applcatons. Journal of the Amercan Statstcal Assocaton, 82, Haste, T., & Tbshran, R. (1990). Generalzed Addtve Models. London: Chapman and Hall. Haste, T., Tbshran, R., & Fredman, J. (2001). The Elements of Statstcal Learnng: Data Mnng, Inference and Predcton. ew York: Sprnger-Verlag. Haughton, D., & Oulab, S. (1997). Drect marketng modelng wth CART and CHAID. Journal of Drect Marketng, 11, Ths paper s accepted for publcaton n European Journal of Operatonal Research 19
21 Hruschka, H. (2010). Consderng endogenety for optmal catalog allocaton n drect marketng. European Journal of Operatonal Research, 206, Km, H.S., & Sohn, S.Y. (2010). Support vector machnes for default predcton of SMEs based on technology credt. European Journal of Operatonal Research, 201, Lamb, C. W., Har, J. F., & McDanel, C. (1994). Prncples of marketng (second ed.). Cncnnat: Soulh-Westem Publshng Co. Lee, H. J., Shn, H., Hwang, S. S., Cho, S., & MacLachlan, D. (2010). Sem-Supervsed Response Modelng. Journal of Interactve Marketng, 24, Martens, D., Baesens, B., Van Gestel, T., & Vanthenen, J. (2007). Comprehensble credt scorng models usng rule extracton from support vector machnes. European Journal of Operatonal Research, 183, Martens, D., Van Gestel, T., De Backer, M., Haesen, R., Vanthenen, J., & Baesens, B. (2010). Credt ratng predcton usng Ant Colony Optmzaton. Journal of the Operatonal Research Socety, 61, Morales, D. R., & Wang, J. B. (2010). Forecastng cancellaton rates for servces bookng revenue management usng data mnng. European Journal of Operatonal Research, 202, ak, P. A., Hagerty, M. R., & Tsa, C. L. (2000). A new dmenson reducton approach for data-rch marketng envronments: Slced nverse regresson. Journal of Marketng Research, 37, esln, S. A., Gupta, S., Kamakura, W., Lu, J. X., & Mason, C. H. (2006). Defecton detecton: Measurng and understandng the predctve accuracy of customer churn models. Journal of Marketng Research, 43, Paleologo, G., Elsseeff, A., & Antonn, G. (2010). Subaggng for credt scorng models. European Journal of Operatonal Research, 201, Ths paper s accepted for publcaton n European Journal of Operatonal Research 20
22 Persma,., & Jonker, J.J. (2004). Determnng the optmal drect malng frequency. European Journal of Operatonal Research, 158, Saerens, M., Latnne, P., & Decaestecker, C. (2002). Adjustng the outputs of a classfer to new a pror probabltes: A smple procedure. eural Computaton, 14, Tabachnck, B. G. & Fdell, L. S. (1996). Usng multvarate statstcs. HarperCollngs Publshers, ew York. Wahba, G. (1990). Splne models for observatonal data. Socety for Industral and Appled Mathematcs (SIAM) Captal Cty Press, Montpeler (Vermont). Wood, S.. (2000). Modellng and Smoothng Parameter Estmaton wth Multple Quadratc Penaltes. Journal of the Royal Statstcal Socety B, 62, Wood, S.. (2004). Stable and effcent multple smoothng parameter estmaton for generalzed addtve models. Journal of the Amercan Statstcal Assocaton, 99, Wood, S.. (2008). Fast stable drect fttng and smoothness selecton for generalzed addtve models. Journal of the Royal Statstcal Socety B, 70, Zahav, J., & Levn,. (1997). Applyng neural computng to target marketng. Journal of Drect Marketng, 11, Ths paper s accepted for publcaton n European Journal of Operatonal Research 21
23 LI LOG GAM GAM MOO TRAI M C k = 1 to 10 k1 k2 k3.. kb ORIGIAL SAERES EW EW EW k1 EW k2 EW k3.. EW EW 1 EW 2 EW 3.. EW REAL REAL REAL k1 REAL k2 REAL k3.. REAL kb Fgure 1: Methodologcal framework. Ths paper s accepted for publcaton n European Journal of Operatonal Research 22
24 Dataset ID TRAI M # customers % responders # customers % responders # varables used by C 1 70, % 119, % , % 119, % , % 117, % , % 305, % , % 1,073, % , % 1,223, % , % 748, % , % 127, % , % 113, % 18 Table 1: Dataset characterstcs. Ths paper s accepted for publcaton n European Journal of Operatonal Research 23
25 Panel a RESCALIG PROBABILITY-MAPPIG LIEAR O-LIEAR DATASET GAM3 GAMgcv ORIGIAL SAERES GLM LOG GAM3 GAMgcv MOO MOO AR Panel b RESCALIG PROBABILITY-MAPPIG LIEAR O-LIEAR DATASET GAM4 GAMgcv ORIGIAL SAERES GLM LOG GAM4 GAMgcv MOO MOO AR Panel c RESCALIG PROBABILITY-MAPPIG LIEAR O-LIEAR DATASET GAM5 GAMgcv ORIGIAL SAERES GLM LOG GAM5 GAMgcv MOO MOO AR * 10-fold CV LL values, AR = average rankng Table 2: The 10-fold cross-valdated log-lkelhood values. Panel a: overvew wth GAM3 & GAM3 MOO. Panel b: overvew wth GAM4 & GAM4 MOO. Panel c: overvew wth GAM5 & GAM5 MOO. Ths paper s accepted for publcaton n European Journal of Operatonal Research 24
26 O-LIEAR DATASET GAM3 GAM4 GAM5 GAMgcv GAM3 GAM4 GAM5 GAMgcv MOO MOO MOO MOO AR * 10-fold CV LL values, AR = average rankng Table 3: The 10-fold cross-valdate log-lkelhood values for GAM and GAM MOO calbraton models. Ths paper s accepted for publcaton n European Journal of Operatonal Research 25
Forecasting the Direction and Strength of Stock Market Movement
Forecastng the Drecton and Strength of Stock Market Movement Jngwe Chen Mng Chen Nan Ye [email protected] [email protected] [email protected] Abstract - Stock market s one of the most complcated systems
What is Candidate Sampling
What s Canddate Samplng Say we have a multclass or mult label problem where each tranng example ( x, T ) conssts of a context x a small (mult)set of target classes T out of a large unverse L of possble
CS 2750 Machine Learning. Lecture 3. Density estimation. CS 2750 Machine Learning. Announcements
Lecture 3 Densty estmaton Mlos Hauskrecht [email protected] 5329 Sennott Square Next lecture: Matlab tutoral Announcements Rules for attendng the class: Regstered for credt Regstered for audt (only f there
Single and multiple stage classifiers implementing logistic discrimination
Sngle and multple stage classfers mplementng logstc dscrmnaton Hélo Radke Bttencourt 1 Dens Alter de Olvera Moraes 2 Vctor Haertel 2 1 Pontfíca Unversdade Católca do Ro Grande do Sul - PUCRS Av. Ipranga,
Logistic Regression. Steve Kroon
Logstc Regresson Steve Kroon Course notes sectons: 24.3-24.4 Dsclamer: these notes do not explctly ndcate whether values are vectors or scalars, but expects the reader to dscern ths from the context. Scenaro
Can Auto Liability Insurance Purchases Signal Risk Attitude?
Internatonal Journal of Busness and Economcs, 2011, Vol. 10, No. 2, 159-164 Can Auto Lablty Insurance Purchases Sgnal Rsk Atttude? Chu-Shu L Department of Internatonal Busness, Asa Unversty, Tawan Sheng-Chang
Causal, Explanatory Forecasting. Analysis. Regression Analysis. Simple Linear Regression. Which is Independent? Forecasting
Causal, Explanatory Forecastng Assumes cause-and-effect relatonshp between system nputs and ts output Forecastng wth Regresson Analyss Rchard S. Barr Inputs System Cause + Effect Relatonshp The job of
Statistical Methods to Develop Rating Models
Statstcal Methods to Develop Ratng Models [Evelyn Hayden and Danel Porath, Österrechsche Natonalbank and Unversty of Appled Scences at Manz] Source: The Basel II Rsk Parameters Estmaton, Valdaton, and
Logistic Regression. Lecture 4: More classifiers and classes. Logistic regression. Adaboost. Optimization. Multiple class classification
Lecture 4: More classfers and classes C4B Machne Learnng Hlary 20 A. Zsserman Logstc regresson Loss functons revsted Adaboost Loss functons revsted Optmzaton Multple class classfcaton Logstc Regresson
The Development of Web Log Mining Based on Improve-K-Means Clustering Analysis
The Development of Web Log Mnng Based on Improve-K-Means Clusterng Analyss TngZhong Wang * College of Informaton Technology, Luoyang Normal Unversty, Luoyang, 471022, Chna [email protected] Abstract.
An Alternative Way to Measure Private Equity Performance
An Alternatve Way to Measure Prvate Equty Performance Peter Todd Parlux Investment Technology LLC Summary Internal Rate of Return (IRR) s probably the most common way to measure the performance of prvate
L10: Linear discriminants analysis
L0: Lnear dscrmnants analyss Lnear dscrmnant analyss, two classes Lnear dscrmnant analyss, C classes LDA vs. PCA Lmtatons of LDA Varants of LDA Other dmensonalty reducton methods CSCE 666 Pattern Analyss
Module 2 LOSSLESS IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur
Module LOSSLESS IMAGE COMPRESSION SYSTEMS Lesson 3 Lossless Compresson: Huffman Codng Instructonal Objectves At the end of ths lesson, the students should be able to:. Defne and measure source entropy..
Credit Limit Optimization (CLO) for Credit Cards
Credt Lmt Optmzaton (CLO) for Credt Cards Vay S. Desa CSCC IX, Ednburgh September 8, 2005 Copyrght 2003, SAS Insttute Inc. All rghts reserved. SAS Propretary Agenda Background Tradtonal approaches to credt
PSYCHOLOGICAL RESEARCH (PYC 304-C) Lecture 12
14 The Ch-squared dstrbuton PSYCHOLOGICAL RESEARCH (PYC 304-C) Lecture 1 If a normal varable X, havng mean µ and varance σ, s standardsed, the new varable Z has a mean 0 and varance 1. When ths standardsed
benefit is 2, paid if the policyholder dies within the year, and probability of death within the year is ).
REVIEW OF RISK MANAGEMENT CONCEPTS LOSS DISTRIBUTIONS AND INSURANCE Loss and nsurance: When someone s subject to the rsk of ncurrng a fnancal loss, the loss s generally modeled usng a random varable or
European Journal of Operational Research
European Journal of Operational Research 214 (2011) 732 738 Contents lists available at ScienceDirect European Journal of Operational Research journal homepage: www.elsevier.com/locate/ejor Decision Support
DEFINING %COMPLETE IN MICROSOFT PROJECT
CelersSystems DEFINING %COMPLETE IN MICROSOFT PROJECT PREPARED BY James E Aksel, PMP, PMI-SP, MVP For Addtonal Informaton about Earned Value Management Systems and reportng, please contact: CelersSystems,
Regression Models for a Binary Response Using EXCEL and JMP
SEMATECH 997 Statstcal Methods Symposum Austn Regresson Models for a Bnary Response Usng EXCEL and JMP Davd C. Trndade, Ph.D. STAT-TECH Consultng and Tranng n Appled Statstcs San Jose, CA Topcs Practcal
Latent Class Regression. Statistics for Psychosocial Research II: Structural Models December 4 and 6, 2006
Latent Class Regresson Statstcs for Psychosocal Research II: Structural Models December 4 and 6, 2006 Latent Class Regresson (LCR) What s t and when do we use t? Recall the standard latent class model
The OC Curve of Attribute Acceptance Plans
The OC Curve of Attrbute Acceptance Plans The Operatng Characterstc (OC) curve descrbes the probablty of acceptng a lot as a functon of the lot s qualty. Fgure 1 shows a typcal OC Curve. 10 8 6 4 1 3 4
Forecasting the Demand of Emergency Supplies: Based on the CBR Theory and BP Neural Network
700 Proceedngs of the 8th Internatonal Conference on Innovaton & Management Forecastng the Demand of Emergency Supples: Based on the CBR Theory and BP Neural Network Fu Deqang, Lu Yun, L Changbng School
STATISTICAL DATA ANALYSIS IN EXCEL
Mcroarray Center STATISTICAL DATA ANALYSIS IN EXCEL Lecture 6 Some Advanced Topcs Dr. Petr Nazarov 14-01-013 [email protected] Statstcal data analyss n Ecel. 6. Some advanced topcs Correcton for
Feature selection for intrusion detection. Slobodan Petrović NISlab, Gjøvik University College
Feature selecton for ntruson detecton Slobodan Petrovć NISlab, Gjøvk Unversty College Contents The feature selecton problem Intruson detecton Traffc features relevant for IDS The CFS measure The mrmr measure
Transition Matrix Models of Consumer Credit Ratings
Transton Matrx Models of Consumer Credt Ratngs Abstract Although the corporate credt rsk lterature has many studes modellng the change n the credt rsk of corporate bonds over tme, there s far less analyss
Answer: A). There is a flatter IS curve in the high MPC economy. Original LM LM after increase in M. IS curve for low MPC economy
4.02 Quz Solutons Fall 2004 Multple-Choce Questons (30/00 ponts) Please, crcle the correct answer for each of the followng 0 multple-choce questons. For each queston, only one of the answers s correct.
On the Optimal Control of a Cascade of Hydro-Electric Power Stations
On the Optmal Control of a Cascade of Hydro-Electrc Power Statons M.C.M. Guedes a, A.F. Rbero a, G.V. Smrnov b and S. Vlela c a Department of Mathematcs, School of Scences, Unversty of Porto, Portugal;
Knowledge Discovery in a Direct Marketing Case using Least Squares Support Vector Machines
Knowledge Dscovery n a Drect Marketng Case usng Least Squares Support Vector Machnes S. Vaene, 1, * B. Baesens, 1 T. Van Gestel, 2 J. A. K. Suykens, 2 D. Van den Poel, 3 J. Vanthenen, 1 B. De Moor, 2 G.
Gender Classification for Real-Time Audience Analysis System
Gender Classfcaton for Real-Tme Audence Analyss System Vladmr Khryashchev, Lev Shmaglt, Andrey Shemyakov, Anton Lebedev Yaroslavl State Unversty Yaroslavl, Russa [email protected], [email protected], [email protected],
THE DISTRIBUTION OF LOAN PORTFOLIO VALUE * Oldrich Alfons Vasicek
HE DISRIBUION OF LOAN PORFOLIO VALUE * Oldrch Alfons Vascek he amount of captal necessary to support a portfolo of debt securtes depends on the probablty dstrbuton of the portfolo loss. Consder a portfolo
A study on the ability of Support Vector Regression and Neural Networks to Forecast Basic Time Series Patterns
A study on the ablty of Support Vector Regresson and Neural Networks to Forecast Basc Tme Seres Patterns Sven F. Crone, Jose Guajardo 2, and Rchard Weber 2 Lancaster Unversty, Department of Management
Support Vector Machines
Support Vector Machnes Max Wellng Department of Computer Scence Unversty of Toronto 10 Kng s College Road Toronto, M5S 3G5 Canada [email protected] Abstract Ths s a note to explan support vector machnes.
ANALYZING THE RELATIONSHIPS BETWEEN QUALITY, TIME, AND COST IN PROJECT MANAGEMENT DECISION MAKING
ANALYZING THE RELATIONSHIPS BETWEEN QUALITY, TIME, AND COST IN PROJECT MANAGEMENT DECISION MAKING Matthew J. Lberatore, Department of Management and Operatons, Vllanova Unversty, Vllanova, PA 19085, 610-519-4390,
Proceedings of the Annual Meeting of the American Statistical Association, August 5-9, 2001
Proceedngs of the Annual Meetng of the Amercan Statstcal Assocaton, August 5-9, 2001 LIST-ASSISTED SAMPLING: THE EFFECT OF TELEPHONE SYSTEM CHANGES ON DESIGN 1 Clyde Tucker, Bureau of Labor Statstcs James
An Interest-Oriented Network Evolution Mechanism for Online Communities
An Interest-Orented Network Evoluton Mechansm for Onlne Communtes Cahong Sun and Xaopng Yang School of Informaton, Renmn Unversty of Chna, Bejng 100872, P.R. Chna {chsun,yang}@ruc.edu.cn Abstract. Onlne
Churn prediction in subscription services: An application of support vector machines while comparing two parameter-selection techniques
Expert Systems wth Applcatons Expert Systems wth Applcatons 34 (2008) 313 327 www.elsever.com/locate/eswa Churn predcton n subscrpton servces: An applcaton of support vector machnes whle comparng two parameter-selecton
Robust Design of Public Storage Warehouses. Yeming (Yale) Gong EMLYON Business School
Robust Desgn of Publc Storage Warehouses Yemng (Yale) Gong EMLYON Busness School Rene de Koster Rotterdam school of management, Erasmus Unversty Abstract We apply robust optmzaton and revenue management
Bayesian Network Based Causal Relationship Identification and Funding Success Prediction in P2P Lending
Proceedngs of 2012 4th Internatonal Conference on Machne Learnng and Computng IPCSIT vol. 25 (2012) (2012) IACSIT Press, Sngapore Bayesan Network Based Causal Relatonshp Identfcaton and Fundng Success
Analysis of Premium Liabilities for Australian Lines of Business
Summary of Analyss of Premum Labltes for Australan Lnes of Busness Emly Tao Honours Research Paper, The Unversty of Melbourne Emly Tao Acknowledgements I am grateful to the Australan Prudental Regulaton
An Empirical Study of Search Engine Advertising Effectiveness
An Emprcal Study of Search Engne Advertsng Effectveness Sanjog Msra, Smon School of Busness Unversty of Rochester Edeal Pnker, Smon School of Busness Unversty of Rochester Alan Rmm-Kaufman, Rmm-Kaufman
Learning from Large Distributed Data: A Scaling Down Sampling Scheme for Efficient Data Processing
Internatonal Journal of Machne Learnng and Computng, Vol. 4, No. 3, June 04 Learnng from Large Dstrbuted Data: A Scalng Down Samplng Scheme for Effcent Data Processng Che Ngufor and Janusz Wojtusak part
Prediction of Stock Market Index Movement by Ten Data Mining Techniques
Vol. 3, o. Modern Appled Scence Predcton of Stoc Maret Index Movement by en Data Mnng echnques Phchhang Ou (Correspondng author) School of Busness, Unversty of Shangha for Scence and echnology Rm 0, Internatonal
8 Algorithm for Binary Searching in Trees
8 Algorthm for Bnary Searchng n Trees In ths secton we present our algorthm for bnary searchng n trees. A crucal observaton employed by the algorthm s that ths problem can be effcently solved when the
Improved SVM in Cloud Computing Information Mining
Internatonal Journal of Grd Dstrbuton Computng Vol.8, No.1 (015), pp.33-40 http://dx.do.org/10.1457/jgdc.015.8.1.04 Improved n Cloud Computng Informaton Mnng Lvshuhong (ZhengDe polytechnc college JangSu
Lecture 5,6 Linear Methods for Classification. Summary
Lecture 5,6 Lnear Methods for Classfcaton Rce ELEC 697 Farnaz Koushanfar Fall 2006 Summary Bayes Classfers Lnear Classfers Lnear regresson of an ndcator matrx Lnear dscrmnant analyss (LDA) Logstc regresson
A hybrid global optimization algorithm based on parallel chaos optimization and outlook algorithm
Avalable onlne www.ocpr.com Journal of Chemcal and Pharmaceutcal Research, 2014, 6(7):1884-1889 Research Artcle ISSN : 0975-7384 CODEN(USA) : JCPRC5 A hybrd global optmzaton algorthm based on parallel
THE APPLICATION OF DATA MINING TECHNIQUES AND MULTIPLE CLASSIFIERS TO MARKETING DECISION
Internatonal Journal of Electronc Busness Management, Vol. 3, No. 4, pp. 30-30 (2005) 30 THE APPLICATION OF DATA MINING TECHNIQUES AND MULTIPLE CLASSIFIERS TO MARKETING DECISION Yu-Mn Chang *, Yu-Cheh
Calculating the high frequency transmission line parameters of power cables
< ' Calculatng the hgh frequency transmsson lne parameters of power cables Authors: Dr. John Dcknson, Laboratory Servces Manager, N 0 RW E B Communcatons Mr. Peter J. Ncholson, Project Assgnment Manager,
BERNSTEIN POLYNOMIALS
On-Lne Geometrc Modelng Notes BERNSTEIN POLYNOMIALS Kenneth I. Joy Vsualzaton and Graphcs Research Group Department of Computer Scence Unversty of Calforna, Davs Overvew Polynomals are ncredbly useful
AN APPOINTMENT ORDER OUTPATIENT SCHEDULING SYSTEM THAT IMPROVES OUTPATIENT EXPERIENCE
AN APPOINTMENT ORDER OUTPATIENT SCHEDULING SYSTEM THAT IMPROVES OUTPATIENT EXPERIENCE Yu-L Huang Industral Engneerng Department New Mexco State Unversty Las Cruces, New Mexco 88003, U.S.A. Abstract Patent
How To Know The Components Of Mean Squared Error Of Herarchcal Estmator S
S C H E D A E I N F O R M A T I C A E VOLUME 0 0 On Mean Squared Error of Herarchcal Estmator Stans law Brodowsk Faculty of Physcs, Astronomy, and Appled Computer Scence, Jagellonan Unversty, Reymonta
A Novel Methodology of Working Capital Management for Large. Public Constructions by Using Fuzzy S-curve Regression
Novel Methodology of Workng Captal Management for Large Publc Constructons by Usng Fuzzy S-curve Regresson Cheng-Wu Chen, Morrs H. L. Wang and Tng-Ya Hseh Department of Cvl Engneerng, Natonal Central Unversty,
Dynamic Resource Allocation for MapReduce with Partitioning Skew
Ths artcle has been accepted for publcaton n a future ssue of ths journal, but has not been fully edted. Content may change pror to fnal publcaton. Ctaton nformaton: DOI 1.119/TC.216.253286, IEEE Transactons
Course outline. Financial Time Series Analysis. Overview. Data analysis. Predictive signal. Trading strategy
Fnancal Tme Seres Analyss Patrck McSharry [email protected] www.mcsharry.net Trnty Term 2014 Mathematcal Insttute Unversty of Oxford Course outlne 1. Data analyss, probablty, correlatons, vsualsaton
Detecting Credit Card Fraud using Periodic Features
Detectng Credt Card Fraud usng Perodc Features Alejandro Correa Bahnsen, Djamla Aouada, Aleksandar Stojanovc and Björn Ottersten Interdscplnary Centre for Securty, Relablty and Trust Unversty of Luxembourg,
Exhaustive Regression. An Exploration of Regression-Based Data Mining Techniques Using Super Computation
Exhaustve Regresson An Exploraton of Regresson-Based Data Mnng Technques Usng Super Computaton Antony Daves, Ph.D. Assocate Professor of Economcs Duquesne Unversty Pttsburgh, PA 58 Research Fellow The
) of the Cell class is created containing information about events associated with the cell. Events are added to the Cell instance
Calbraton Method Instances of the Cell class (one nstance for each FMS cell) contan ADC raw data and methods assocated wth each partcular FMS cell. The calbraton method ncludes event selecton (Class Cell
An Evaluation of the Extended Logistic, Simple Logistic, and Gompertz Models for Forecasting Short Lifecycle Products and Services
An Evaluaton of the Extended Logstc, Smple Logstc, and Gompertz Models for Forecastng Short Lfecycle Products and Servces Charles V. Trappey a,1, Hsn-yng Wu b a Professor (Management Scence), Natonal Chao
Activity Scheduling for Cost-Time Investment Optimization in Project Management
PROJECT MANAGEMENT 4 th Internatonal Conference on Industral Engneerng and Industral Management XIV Congreso de Ingenería de Organzacón Donosta- San Sebastán, September 8 th -10 th 010 Actvty Schedulng
Recurrence. 1 Definitions and main statements
Recurrence 1 Defntons and man statements Let X n, n = 0, 1, 2,... be a MC wth the state space S = (1, 2,...), transton probabltes p j = P {X n+1 = j X n = }, and the transton matrx P = (p j ),j S def.
How To Understand The Results Of The German Meris Cloud And Water Vapour Product
Ttel: Project: Doc. No.: MERIS level 3 cloud and water vapour products MAPP MAPP-ATBD-ClWVL3 Issue: 1 Revson: 0 Date: 9.12.1998 Functon Name Organsaton Sgnature Date Author: Bennartz FUB Preusker FUB Schüller
Method for assessment of companies' credit rating (AJPES S.BON model) Short description of the methodology
Method for assessment of companes' credt ratng (AJPES S.BON model) Short descrpton of the methodology Ljubljana, May 2011 ABSTRACT Assessng Slovenan companes' credt ratng scores usng the AJPES S.BON model
THE METHOD OF LEAST SQUARES THE METHOD OF LEAST SQUARES
The goal: to measure (determne) an unknown quantty x (the value of a RV X) Realsaton: n results: y 1, y 2,..., y j,..., y n, (the measured values of Y 1, Y 2,..., Y j,..., Y n ) every result s encumbered
NPAR TESTS. One-Sample Chi-Square Test. Cell Specification. Observed Frequencies 1O i 6. Expected Frequencies 1EXP i 6
PAR TESTS If a WEIGHT varable s specfed, t s used to replcate a case as many tmes as ndcated by the weght value rounded to the nearest nteger. If the workspace requrements are exceeded and samplng has
Traffic State Estimation in the Traffic Management Center of Berlin
Traffc State Estmaton n the Traffc Management Center of Berln Authors: Peter Vortsch, PTV AG, Stumpfstrasse, D-763 Karlsruhe, Germany phone ++49/72/965/35, emal [email protected] Peter Möhl, PTV AG,
A DATA MINING APPLICATION IN A STUDENT DATABASE
JOURNAL OF AERONAUTICS AND SPACE TECHNOLOGIES JULY 005 VOLUME NUMBER (53-57) A DATA MINING APPLICATION IN A STUDENT DATABASE Şenol Zafer ERDOĞAN Maltepe Ünversty Faculty of Engneerng Büyükbakkalköy-Istanbul
CHOLESTEROL REFERENCE METHOD LABORATORY NETWORK. Sample Stability Protocol
CHOLESTEROL REFERENCE METHOD LABORATORY NETWORK Sample Stablty Protocol Background The Cholesterol Reference Method Laboratory Network (CRMLN) developed certfcaton protocols for total cholesterol, HDL
Customer Lifetime Value Modeling and Its Use for Customer Retention Planning
Customer Lfetme Value Modelng and Its Use for Customer Retenton Plannng Saharon Rosset Enat Neumann Ur Eck Nurt Vatnk Yzhak Idan Amdocs Ltd. 8 Hapnna St. Ra anana 43, Israel {saharonr, enatn, ureck, nurtv,
Forecasting and Stress Testing Credit Card Default using Dynamic Models
Forecastng and Stress Testng Credt Card Default usng Dynamc Models Tony Bellott and Jonathan Crook Credt Research Centre Unversty of Ednburgh Busness School Verson 4.5 Abstract Typcally models of credt
CHAPTER 14 MORE ABOUT REGRESSION
CHAPTER 14 MORE ABOUT REGRESSION We learned n Chapter 5 that often a straght lne descrbes the pattern of a relatonshp between two quanttatve varables. For nstance, n Example 5.1 we explored the relatonshp
Institute of Informatics, Faculty of Business and Management, Brno University of Technology,Czech Republic
Lagrange Multplers as Quanttatve Indcators n Economcs Ivan Mezník Insttute of Informatcs, Faculty of Busness and Management, Brno Unversty of TechnologCzech Republc Abstract The quanttatve role of Lagrange
On-Line Fault Detection in Wind Turbine Transmission System using Adaptive Filter and Robust Statistical Features
On-Lne Fault Detecton n Wnd Turbne Transmsson System usng Adaptve Flter and Robust Statstcal Features Ruoyu L Remote Dagnostcs Center SKF USA Inc. 3443 N. Sam Houston Pkwy., Houston TX 77086 Emal: [email protected]
The Use of Analytics for Claim Fraud Detection Roosevelt C. Mosley, Jr., FCAS, MAAA Nick Kucera Pinnacle Actuarial Resources Inc.
Paper 1837-2014 The Use of Analytcs for Clam Fraud Detecton Roosevelt C. Mosley, Jr., FCAS, MAAA Nck Kucera Pnnacle Actuaral Resources Inc., Bloomngton, IL ABSTRACT As t has been wdely reported n the nsurance
Enterprise Master Patient Index
Enterprse Master Patent Index Healthcare data are captured n many dfferent settngs such as hosptals, clncs, labs, and physcan offces. Accordng to a report by the CDC, patents n the Unted States made an
Optimal allocation of safety and security resources
397 A publcaton of VOL. 31, 2013 CHEMICAL ENGINEERING TRANSACTIONS Guest Edtors: Eddy De Rademaeker, Bruno Fabano, Smberto Senn Buratt Copyrght 2013, AIDIC Servz S.r.l., ISBN 978-88-95608-22-8; ISSN 1974-9791
Calculation of Sampling Weights
Perre Foy Statstcs Canada 4 Calculaton of Samplng Weghts 4.1 OVERVIEW The basc sample desgn used n TIMSS Populatons 1 and 2 was a two-stage stratfed cluster desgn. 1 The frst stage conssted of a sample
Optimal Customized Pricing in Competitive Settings
Optmal Customzed Prcng n Compettve Settngs Vshal Agrawal Industral & Systems Engneerng, Georga Insttute of Technology, Atlanta, Georga 30332 [email protected] Mark Ferguson College of Management,
Binomial Link Functions. Lori Murray, Phil Munz
Bnomal Lnk Functons Lor Murray, Phl Munz Bnomal Lnk Functons Logt Lnk functon: ( p) p ln 1 p Probt Lnk functon: ( p) 1 ( p) Complentary Log Log functon: ( p) ln( ln(1 p)) Motvatng Example A researcher
Conversion between the vector and raster data structures using Fuzzy Geographical Entities
Converson between the vector and raster data structures usng Fuzzy Geographcal Enttes Cdála Fonte Department of Mathematcs Faculty of Scences and Technology Unversty of Combra, Apartado 38, 3 454 Combra,
GRAVITY DATA VALIDATION AND OUTLIER DETECTION USING L 1 -NORM
GRAVITY DATA VALIDATION AND OUTLIER DETECTION USING L 1 -NORM BARRIOT Jean-Perre, SARRAILH Mchel BGI/CNES 18.av.E.Beln 31401 TOULOUSE Cedex 4 (France) Emal: [email protected] 1/Introducton The
HOUSEHOLDS DEBT BURDEN: AN ANALYSIS BASED ON MICROECONOMIC DATA*
HOUSEHOLDS DEBT BURDEN: AN ANALYSIS BASED ON MICROECONOMIC DATA* Luísa Farnha** 1. INTRODUCTION The rapd growth n Portuguese households ndebtedness n the past few years ncreased the concerns that debt
Multiple-Period Attribution: Residuals and Compounding
Multple-Perod Attrbuton: Resduals and Compoundng Our revewer gave these authors full marks for dealng wth an ssue that performance measurers and vendors often regard as propretary nformaton. In 1994, Dens
Data Visualization by Pairwise Distortion Minimization
Communcatons n Statstcs, Theory and Methods 34 (6), 005 Data Vsualzaton by Parwse Dstorton Mnmzaton By Marc Sobel, and Longn Jan Lateck* Department of Statstcs and Department of Computer and Informaton
CHAPTER 5 RELATIONSHIPS BETWEEN QUANTITATIVE VARIABLES
CHAPTER 5 RELATIONSHIPS BETWEEN QUANTITATIVE VARIABLES In ths chapter, we wll learn how to descrbe the relatonshp between two quanttatve varables. Remember (from Chapter 2) that the terms quanttatve varable
Formulating & Solving Integer Problems Chapter 11 289
Formulatng & Solvng Integer Problems Chapter 11 289 The Optonal Stop TSP If we drop the requrement that every stop must be vsted, we then get the optonal stop TSP. Ths mght correspond to a ob sequencng
How Sets of Coherent Probabilities May Serve as Models for Degrees of Incoherence
1 st Internatonal Symposum on Imprecse Probabltes and Ther Applcatons, Ghent, Belgum, 29 June 2 July 1999 How Sets of Coherent Probabltes May Serve as Models for Degrees of Incoherence Mar J. Schervsh
Realistic Image Synthesis
Realstc Image Synthess - Combned Samplng and Path Tracng - Phlpp Slusallek Karol Myszkowsk Vncent Pegoraro Overvew: Today Combned Samplng (Multple Importance Samplng) Renderng and Measurng Equaton Random
General Iteration Algorithm for Classification Ratemaking
General Iteraton Algorthm for Classfcaton Ratemakng by Luyang Fu and Cheng-sheng eter Wu ABSTRACT In ths study, we propose a flexble and comprehensve teraton algorthm called general teraton algorthm (GIA)
Project Networks With Mixed-Time Constraints
Project Networs Wth Mxed-Tme Constrants L Caccetta and B Wattananon Western Australan Centre of Excellence n Industral Optmsaton (WACEIO) Curtn Unversty of Technology GPO Box U1987 Perth Western Australa
A DYNAMIC CRASHING METHOD FOR PROJECT MANAGEMENT USING SIMULATION-BASED OPTIMIZATION. Michael E. Kuhl Radhamés A. Tolentino-Peña
Proceedngs of the 2008 Wnter Smulaton Conference S. J. Mason, R. R. Hll, L. Mönch, O. Rose, T. Jefferson, J. W. Fowler eds. A DYNAMIC CRASHING METHOD FOR PROJECT MANAGEMENT USING SIMULATION-BASED OPTIMIZATION
Politecnico di Torino. Porto Institutional Repository
Poltecnco d Torno Porto Insttutonal Repostory [Artcle] A cost-effectve cloud computng framework for acceleratng multmeda communcaton smulatons Orgnal Ctaton: D. Angel, E. Masala (2012). A cost-effectve
NEURO-FUZZY INFERENCE SYSTEM FOR E-COMMERCE WEBSITE EVALUATION
NEURO-FUZZY INFERENE SYSTEM FOR E-OMMERE WEBSITE EVALUATION Huan Lu, School of Software, Harbn Unversty of Scence and Technology, Harbn, hna Faculty of Appled Mathematcs and omputer Scence, Belarusan State
Brigid Mullany, Ph.D University of North Carolina, Charlotte
Evaluaton And Comparson Of The Dfferent Standards Used To Defne The Postonal Accuracy And Repeatablty Of Numercally Controlled Machnng Center Axes Brgd Mullany, Ph.D Unversty of North Carolna, Charlotte
An artificial Neural Network approach to monitor and diagnose multi-attribute quality control processes. S. T. A. Niaki*
Journal of Industral Engneerng Internatonal July 008, Vol. 4, No. 7, 04 Islamc Azad Unversty, South Tehran Branch An artfcal Neural Network approach to montor and dagnose multattrbute qualty control processes
7.5. Present Value of an Annuity. Investigate
7.5 Present Value of an Annuty Owen and Anna are approachng retrement and are puttng ther fnances n order. They have worked hard and nvested ther earnngs so that they now have a large amount of money on
