European Journal of Operational Research

Size: px
Start display at page:

Download "European Journal of Operational Research"

Transcription

1 European Journal of Operational Research 214 (2011) Contents lists available at ScienceDirect European Journal of Operational Research journal homepage: Decision Support A probability-mapping algorithm for calibrating the posterior probabilities: A direct marketing application Kristof Coussement a,, Wouter Buckinx b a IESEG School of Management Catholic University of Lille (LEM, UMR CNRS 8179), Department of Marketing, 3 Rue de la Digue, F Lille, France b Python Predictions, Avenue R. Van den Driessche 9, B-1150 Brussels, Belgium article info abstract Article history: Received 3 October 2010 Accepted 16 May 2011 Available online 23 May 2011 Keywords: Data mining Decision support systems Direct marketing Response modeling Calibration Calibration refers to the adjustment of the posterior probabilities output by a classification algorithm towards the true prior probability distribution of the target classes. This adjustment is necessary to account for the difference in prior distributions between the training set and the test set. This article proposes a new calibration method, called the probability-mapping approach. Two types of mapping are proposed: linear and non-linear probability mapping. These new calibration techniques are applied to 9 real-life direct marketing datasets. The newly-proposed techniques are compared with the original, non-calibrated posterior probabilities and the adjusted posterior probabilities obtained using the rescaling algorithm of Saerens et al. (2002). The results recommend that marketing researchers must calibrate the posterior probabilities obtained from the classifier. Moreover, it is shown that using a simple rescaling algorithm is not a first and workable solution, because the results suggest applying the newly-proposed non-linear probability-mapping approach for best calibration performance. Ó 2011 Elsevier B.V. All rights reserved. 1. Introduction Due to recent developments in IT infrastructure and the everincreasing trust placed in complex computer systems, analysts are showing an increasing interest in classification modeling in a variety of disciplines such as credit scoring (Martens et al., 2010; Paleologo et al., 2010), medicine (Conforti and Guido, 2010), text classification (Bosio and Righini, 2007), SMEs fund management (Kim and Sohn, 2010), revenue management (Morales and Wang, 2010), and so on. The same interests are shared by the direct marketing community. Direct marketing analysts have an increasing interest in building prediction models that assign a probability of response to each and every individual customer in the database (Lamb et al., 1994). The task of classification is made even more interesting by the fact that nowadays current marketing environments store incredible amounts of customer information at a very low cost, including socio-demographics, transactional buying behavior, attitudinal data, etc. (Naik et al., 2000), while at the same time there has been a tremendous increase in academic interest in direct marketing applications (e.g. Allenby et al., 1999; Baumgartner and Hruschka, 2005; Hruschka, 2010; Lee et al., 2010; Piersma and Jonker, 2004). Therefore response models are defined as classification models that attempt to discriminate between responders and non-responders on a certain company mailing. Corresponding author. Tel.: addresses: [email protected] (K. Coussement), Wouter.Buckinx@ pythonpredictions.com (W. Buckinx). In the past, purely statistical methods like logistic regression, discriminant analysis and naive bayes models have been proposed to discriminate between responders and non-responders in a direct marketing context (Baesens et al., 2002; Bult, 1993; Deichmann et al., 2002). Although these techniques may be very effective, they make a stringent assumption about the underlying relationship between the independent variables and the dependent or response variable. In response to this, more advanced data mining algorithms like decision tree-generating techniques, artificial neural networks and support vector machines have been applied (Baesens et al., 2002; Bose and Chen, 2009; Crone et al., 2006; Haughton and Oulabi, 1997; Zahavi and Levin, 1997). All these binary classification models are used for two reasons. First, researchers rely on them to obtain robust parameter estimates of the independent variables by modeling the probability of response as a function of the independent variables. Second, these models are used to obtain consistent predicted probabilities of response, which are then used (i) to rank the customers based on their responsiveness to the campaign, (ii) to optimize the overall campaign strategy by offering the customer the product with the highest response probability over the different response models and (iii) for the discrimination task of the response event itself where one classifies customers into responders and non-responders. For (ii) and (iii), the absolute size of the posterior response probabilities is crucial. This study focuses on the process of obtaining correct response probabilities, where calibrating the posterior probabilities could have a positive impact on the optimization of the overall campaign strategy and the efficiency of the discrimination task /$ - see front matter Ó 2011 Elsevier B.V. All rights reserved. doi: /j.ejor

2 K. Coussement, W. Buckinx / European Journal of Operational Research 214 (2011) In practice, a classification model is built on a training set, i.e. a set of customers where both the independent variables and the dependent variable are present. In order to correctly measure the discrimination power of the trained classifier, the classification model is applied to a group of customers who have not been used for training, called the scoring or test set. The purpose is to obtain robust and consistent predictions for the response probability of these unseen customers. As one is interested to divide the customers into responders and non-responders, a judicious classification based on the posterior response probabilities of the customers is needed. In other words, customers having a response probability exceeding a certain threshold will be classified as responders and vice versa. However, it often happens that a classifier is trained using a dataset that does not reflect the true prior probabilities of the target classes in the real-life population. This may have serious negative consequences on the discrimination performance because the posterior probabilities do not reflect the true probability of response. This phenomenon occurs in a direct marketing context as well where the prior probabilities between the training set and the (out-of-sample) test set are significantly different. More specifically, the training set consists of customers who are preselected by an earlier response model as being customers with a high response probability, while the test set does not make any restrictions based on the customer profiles in the database. In such a case, a large discrepancy exists between the response distributions on the training set and the test set. The incidence, which is the percentage of responders in a data set, is much higher in the training set as compared to the incidence of real response in the out-ofsample test set. This inconsistency has a negative effect on the discrimination performance on the test set, especially because the classifier s decision to classify customers into responders or nonresponders is based on setting a threshold on the raw posterior probabilities of class membership. For instance, when a classifier is trained on a dataset with a higher incidence than the one in the test set, the posterior probabilities on the test set are inflated. Thus making a classification decision based on the absolute value of the posterior probabilities may significantly harm the discrimination performance. Moreover, optimizing the campaign strategy by offering the product with the highest response probability to the customer becomes useless because the response probabilities for different products for a particular customer are not comparable. This paper focuses on how researchers can adjust the posterior probabilities based on the true prior distribution of the response variable. This process of adjustment is called calibration. This paper proposes a new methodology to be used to calibrate the posterior probabilities from the test set with the real-world situation, a process called probability-mapping. It maps the posterior response probabilities obtained from the classifier onto the prior distribution of real response. The new probability-mapping approaches using generalized linear models and non-parametric generalized additive models are compared with the original, noncalibrated posterior probabilities and the calibrated probabilities using the rescaling methodology of Saerens et al. (2002). This paper is structured as follows: Section 2 describes the methodological framework, while Section 3 explores the different calibration approaches (rescaling approaches and probabilitymapping approaches). Section 4 explains the characteristics of empirical validation, while Section 5 explores the results. Section 6 gives managerial recommendations, and finally Section 7 concludes this paper. 2. Methodological framework Fig. 1 shows the methodological framework for the different calibration methods applied in this study. Define a training set TRAIN M ¼fðx i ; y i Þg m i¼1 consisting of m customers. Each customer (x i, y i ) is a combination of an input vector x i representing the independent variables and a dependent variable y i with y i e {0, 1} corresponding to whether or not a customer responded on a certain mailing. TRAIN M consists of all customers who were selected by a previous response model, thus received a direct mailing to buy the product, and therefore indicated as customers having a high response probability. During the training phase, a classifier C maps the input vector space onto the binary response variable using the training set observations. For the test set TEST N ¼fðx i Þg n i¼1 consisting of n customers, the trained classifier C is applied and for every customer in TEST N a response probability P org is obtained. The purpose of this paper is to adjust the posterior probabilities P org to the real response distribution because the trained sample TRAIN M is not representative for TEST N which corresponds to the true population. Therefore for every observation (x i ) in TEST N, the real response is collected and summarized in REAL N ¼fðy i Þg n i¼1 with y i e {0, 1} corresponding to whether or not the customer spontaneously bought that particular product in a time window without direct mailing actions. The real response represents a response of pure interest in the product. In other words, REAL N is used to represent the true prior probabilities. The purpose of the calibration phase is to adjust P org, the noncalibrated posterior probabilities of TEST N, in order to truly represent the probability of response. With the aim of methodologically benchmarking the different calibration methods, a k-fold cross-validation is applied. In a k-fold cross-validation, the dataset is randomly split into k equal parts of which one after the other is used during the scoring phase; while the other k 1 parts are used for training the calibration model. Note that TEST kn (REAL kn ) represents the k-fold for TEST N (REAL N ), while P korg represents the noncalibrated posterior probabilities of TEST kn. 3. Calibration approaches Two types of calibration methods are applied: (i) the rescaling algorithm of Saerens et al. (2002) and (ii) the newly-proposed probability-mapping approaches. The former algorithm rescales P korg the posterior probabilities of TEST kn taking into account the real incidence of REAL kn (Saerens et al., 2002), while the latter type adjusts the posterior probabilities of TEST kn by mapping them onto the real responses of REAL kn Rescaling algorithm (SAERENS) This section explains the methodology of Saerens et al. (2002). The starting point of the Saerens et al. (2002) calibration approach is based on Bayes rule, i.e. the posterior probabilities of response depend in a non-linear way on the prior probability distribution of the target classes. The prior probability distribution of the target class is defined as the incidence of the target class, or in this setting the percentage of responders in the dataset. Therefore, a change in the prior probability distribution of the target classes changes the posterior response probabilities of the classification model. Saerens et al. (2002) describe a process that adjusts the posterior probabilities of response output by the classifier to the new prior probability distribution of the target classes making use of a predefined rescaling formula. In detail, the calibrated posterior probabilities of response for the customers in the test set of fold k are obtained by weighting the non-calibrated posterior probabilities, P korg,by the ratio of the response incidence of REAL kn, i.e. the new prior probability distribution, to the response incidence in the training set, i.e. the old prior probability distribution. The denominator is a scaling factor to make sure that the calibrated posterior probabilities sum up to one.

3 734 K. Coussement, W. Buckinx / European Journal of Operational Research 214 (2011) TRAINM C TESTN k = 1 to 10 TESTkN TESTk1 TESTk2 TESTk3 TESTkb LIN LOG GAM GAM MONO ORIGINAL SAERENS TESTkN NEWkN NEWk1 NEWk2 NEWk3 NEWkN NEWN NEW1 NEW2 NEW3 NEWN REALN REALkN REALk1 REALk2 REALk3 REALkb Fig. 1. Methodological framework. In summary, P knew ¼ P k ðc 1 Þ P kt ðc 1 Þ P korg P k ðc 0 Þ ð1 P P kt ðc 0 Þ korgþþ P kðc 1 Þ P P kt ðc 1 Þ korg with P knew representing the calibrated posterior response probabilities in fold k, P k (c i ) and P kt (c i ) the new and old prior probabilities for class i with i e {0, 1}. A data set NEW kn is obtained which contains P knew, the calibrated posterior probabilities for the test data of TEST kn Probability-mapping approaches The purpose of the probability-mapping approaches is to map P korg, the old posterior probabilities of TEST kn, onto the real response probabilities of REAL kn. As such, one is able to build a classification model that maps the non-calibrated probabilities onto the real response probabilities. This model is then used to calibrate the old probabilities with the corrected probabilities of response. However, the real probability distribution of the target classes is not directly available from REAL kn which only contains the real responses y i with y i e {0, 1} on an individual customer level. In order to convert the real responses y i with y i e {0, 1} on an individual level in REAL kn into a real response probability distribution, a number of bins b are constructed. The incidence of response is calculated per bin and equals the percentage of real response. This incidence is used as an approximation for the real probability of response per bin. In practice, both TEST kn and REAL kn are split into a number of bins b using the equal frequency binning approach based on the posterior probabilities of TEST kn. TEST kb (REAL kb ) represents the bth bin in the k-fold of TEST kn (REAL kn respectively). TEST kb and REAL kb logically contain identical observations, while P kborg is the non-calibrated posterior probability average for the bth bin in TEST kn and P kbreal is the percentage of real responders in the bth bin of REAL kn. P kbreal serves as a proxy for the true prior probability. In order to formalize the relationship between the average posterior probabilities of TEST kn and the approximate real probabilities obtained from REAL kn, a formal mapping is obtained using the binned training set of fold k by P kbreal ¼ f k ðp kborg Þ with f k being the classifier that maps the non-calibrated posterior probabilities onto the real probabilities in fold k. After the classifier ð1þ ð2þ f k is built, it is applied to the unseen test data of TEST kn to obtain the new posterior probabilities, P knew, for every individual in the test data set of the kth fold. A new data set is obtained NEW kn which contains P knew, the calibrated posterior probabilities. There are several possibilities for f k, a function that links the estimated, non-calibrated probabilities of TEST kb to the approximated real probabilities of REAL kb. This study uses one linear probability-mapping approach based on generalized linear models (Section 3.2.1) and three non-linear approaches; one based on generalized linear models with log-transformed non-calibrated probabilities (Section 3.2.2) and two approaches based on generalized additive models (Section and Section 3.2.4) Generalized linear model (GLM) Given y i as the dependent variable with y i e [0, 1] representing P kbreal, the averaged true prior probabilities from REAL kb and x i equal to P kborg, the averaged posterior probabilities of TEST kb, a generalized linear model with logit link function is employed to model f k (x i ) e [0, 1]. Moreover, it assumes that the relationship between P kborg and P kbreal is linear in the log-odds via y logitfy i g log i ¼ a k þ b 1 y ki x i i or y i f k ðx i Þ¼logit 1 ða k þ b ki x i Þ ð4þ with a k as the intercept and b ki x i as the predictor. The parameters a k and b ki are estimated using maximum likelihood (Tabachnick and Fidell, 1996) Generalized linear model with log transformation (LOG) Another approach is to log-transform x i in Eqs. (3) and (4), because as such one captures the non-linearity in the log-odds space between y i, P kbreal the true prior probabilities from REAL kb, and x i, P kborg the posterior probabilities of TEST kb Generalized additive models An attractive alternative to standard generalized linear models is generalized additive models (Hastie and Tibshirani, 1986, 1987, 1990). Generalized additive models relax the linearity constraint and apply a non-parametric non-linear fit to the data. In other words, the data themselves decide on the functional form ð3þ

4 K. Coussement, W. Buckinx / European Journal of Operational Research 214 (2011) between the independent variable and the dependent variable. Define y i as the dependent variable with y i e [0, 1] representing P kbreal, the true posterior probabilities from REAL kb, and x i equals to P kborg, the posterior probabilities of TEST kb. To model f k (x i ) e [0, 1], generalized additive models with logit link function are employed. Methodologically, generalized additive models generalize the generalized linear model principle by replacing the linear predictor b ki x i in Eq. (4) with an additive component where y i f k ðx i Þ¼logit 1 ða k þ s ki ðx i ÞÞ ð5þ with s ki (x i ) as a smooth function. This study uses penalized regression splines s ki (x i ) to estimate the non-parametric trend for the dependency of y i on x i (Wahba, 1990; Green and Silverman, 1994). These smooth functions use a large number of knots leading to a model quite insensitive to the knot locations, while the penalty term is used to avoid the danger of over-fitting that would otherwise accompany the use of many knots. The complexity of the model is controlled by a parameter k and it is inversely related to the degrees of freedom (df). If k is small (i.e. the df are large), a very complex model that closely matches the data is employed. When k is large (i.e. the df are small), a smooth model is considered. In order to optimize the generalized additive model, the fitting amounts to penalized likelihood maximization by penalized iteratively reweighted least squares (Wood, 2000, 2004, 2008) Generalized additive models with monotonicity constraint Due to the fact that generalized additive models produce a nonlinear relationship between the independent variable P kborg and the dependent variable P kbreal, the original ranking of the posterior probabilities of TEST kn and its calibrated version may change. However, marketing analysts could argue that the mapping from TRAIN M onto TEST N and the corresponding ranking of the customers in TEST N (and respectively TEST kn ) given by the initial classifier C should be conserved. As such a non-decreasing monotonicity constraint on the generalized additive models predictions is introduced to retain the original ranking of the customers. Inspired by rule-set creation advances in the post-learning phase (e.g. pedagogical rule-based extraction techniques as employed in Martens et al. (2007)), a rule set on the training set of fold k is produced in the post-estimation phase of the generalized additive models to obtain a function f 0 k, a non-decreasing monotone function. This ensures that the initial ranking of P kborg is maintained in the corresponding predictions P kbreal of fold k. Practically, the training set is sorted by P kborg. Afterwards the rule-based algorithm detects all non-decreasing monotonic inconsistencies on the prediction values f k (P kborg ) on the training set. For instance, suppose that the prediction value for bin X + 1 is lower than the prediction value for bin X than the rule-based algorithm adds a rule to the rule-base to change the prediction value of bin X + 1 to the larger prediction value of bin X. In the end, the generalized additive model and the rule-base describe a non-decreasing monotone generalized additive model based function f 0 k with following characteristics (Denlinger, 2010) if P kborgx 6 P kborgxþ1 ) f 0 k ðp kborgxþ 6 f 0 k ðp kborgxþ1þ with P kborgx and P kborgx+1 original non-calibrated posterior probabilities for bins X and X + 1 in the training data set, and f 0 k ðp kborgxþ and f 0 k ðp kborgxþ1þ the calibrated posterior probabilities in fold k for bins X and X Empirical validation The calibration methods are employed on a test bed of 9 real-life direct marketing datasets provided by a large European financial institution. Each of these datasets corresponds to a typical ð7þ financial product. Table 1 shows the characteristics of the response datasets. With the aim of methodologically comparing the different algorithms, a 10-fold cross-validation is applied. Furthermore, the classifier C which links TRAIN M and TEST N and outputs P org is a logistic regression with forward variable selection as it is a robust and well-known classification technique in the marketing environment (Neslin et al., 2006). Moreover, the calibration approaches based on generalized additive models use different levels of degrees of freedom (df) representing the non-linearity of the model. The higher the df, the higher the non-linearity. On the hand, the df are set manually by the researcher (user-specified), while on the other hand the df are simultaneously estimated in correspondence with the shape of the response function (automatic). This study opts to manually set the df equal to {3, 4, 5} (resulting in GAMdf and GAMdf MONO). This df range is inspired by the recommendation and the applications in Hastie and Tibshirani (1990) and Hastie et al. (2001) that use a relatively small number of df to account for different levels of non-linearity. Additionally, the generalized cross-validation procedure (GCV) is employed to automatically select the ideal number of df, resulting in GAMgcv and GAMgcv MONO (Gu and Wahba, 1991; Wood, 2000; Wood, 2004). The number of bins b for TEST kn and REAL kn is set to 200. Furthermore, P org, the non-calibrated posterior probabilities of TEST N, are used as a benchmark (ORIGINAL). The different algorithms are compared on an individual customer level using the log-likelihood (LL) by LL ¼ lnð YN pðx i Þ y i ½1 pðx i ÞŠ 1 y i Þ¼ XN fy i ln½pðx i ÞŠþð1 y i Þln½1 pðx i ÞŠg i¼1 i¼1 with N the number of customers, p(x i ) equal to P knew, the calibrated posterior response probability, and y i as the real response variable with y i e {0, 1}. The LL is a well-known metric in (direct) marketing to evaluate the performance of an algorithm (e.g. Baumgartner and Hruschka, 2005). The higher the LL, the better the calibration of the posterior probabilities to the true response distribution is. Moreover, the non-parametric Friedman test (Demšar, 2006; Friedman, 1937, 1940) with the Bonferroni Dunn test (Dunn, 1961) is used in order to significantly compare the different approaches with the best performing algorithm. 5. Results Table 2 represents the 10-fold cross-validated log-likelihood values for the different datasets and the different algorithms. Three panels (a, b, c) are included representing the various levels of the user-selected degrees of freedom for the generalized additive model mappings. For each dataset, the best performing algorithm in terms of log-likelihood is put in italics. Moreover, the average ranking (AR) per algorithm over the different datasets is given. The lower the ranking, the better the algorithm is shown to be. The best performing algorithm is underlined and set in bold, while the algorithms that are not significantly different to the best one at a 5% significance level are only set in bold. The algorithms are split into four categories; the original, noncalibrated posterior probabilities (ORIGINAL), the rescaling methodology (SAERENS), the linear probability-mapping approach (GLM) and the non-linear probability-mapping approaches (LOG, GAMdf, GAMdf MONO, GAMgcv and GAMgcv MONO). Table 2 reveals that calibrating the posterior probabilities has a beneficial impact when a discrepancy exists between the true prior probabilities of the training set and the test set: ORIGINAL always performs worse than the other calibration approaches. Comparing the rescaling approach (SAERENS) with the best performing calibration approaches, one concludes that SAERENS ð8þ

5 736 K. Coussement, W. Buckinx / European Journal of Operational Research 214 (2011) Table 1 Dataset characteristics. Dataset ID 1 TEST N # Variables used by C # Customers % Responders # Customers % Responders 1 70, , , , , , , , ,073, , ,223, , , , , , , Table 2 The 10-fold cross-validated log-likelihood values. Panel a: overview with GAM3 & GAM3 MONO. Panel b: overview with GAM4 & GAM4 MONO. Panel c: overview with GAM5 & GAM5 MONO. Dataset Rescaling Probability-mapping Linear Non-linear Panel a: overview with GAM3 & GAM3 MONO Original Saerens GLM LOG GAM3 GAMgcv GAM3 MONO GAMgcv MONO AR Panel b: overview with GAM4 & GAM4 MONO Original Saerens GLM LOG GAM4 GAMgcv GAM4 MONO GAMgcv MONO AR Panel c: overview with GAM5 & GAM5 MONO Original Saerens GLM LOG GAM5 GAMgcv GAM5 MONO GAMgcv MONO AR fold CV LL values, AR = average ranking. always significantly performs less well than the non-linear probability-mapping approaches, while SAERENS performs better than the linear probability-mapping approach (GLM). These results show that the analyst better shifts towards a non-linear probability-mapping approach, despite the fact that SAERENS is an easy and workable solution to the calibration problem. Contrasting the various probability-mapping approaches, Table 2 discloses that the non-linear calibration approaches (LOG, GAMdf, GAMdf MONO, GAMgcv and GAMgcv MONO) are always among the best performing algorithms. The linear mapping approach (GLM) is never significantly competitive with one of its non-linear counterparts. However, the generalized linear model with log-transformation (LOG) is competitive to the more advanced GAM approaches (GAMdf, GAMdf MONO, GAMgcv and GAMgcv MONO). Within the non-linear calibration setting, one concludes that GAMgcv MONO always performs best, followed by the other non-linear calibration approaches. Table 3 contains the performance measures for all generalized additive models approaches (GAMdf, GAMdf MONO, GAMgcv and GAMgcv MONO), for all the levels of degrees of freedom. On a

6 K. Coussement, W. Buckinx / European Journal of Operational Research 214 (2011) Table 3 The 10-fold cross-validate log-likelihood values forgam and GAM MONO calibration models. Dataset Non-linear GAM3 GAM3 MONO GAM4 GAM4 MONO GAM5 GAM5 MONO GAMgcv GAMgcv MONO AR fold CV LL values, AR = average ranking. dataset level, the best performing algorithm is put in italics. Furthermore, the average ranking (AR) for each algorithm is given and the best performing algorithm (i.e. the one with the lowest ranking) is underlined and set in bold, while the ones that are not significantly different to the best at a 5% significance level are simply put in bold. Table 3 reveals that GAM5 MONO is the best performing algorithm among the GAM and GAM MONO approaches, quickly followed by GAMgcv MONO. Table 3 shows a better performance trend for the GAM approaches when the number of df are increased. GAM3 performs less well than GAM4, while GAM4 has a less well performance than GAM5. Furthermore, it is clear that including the monotonicity constraint has a beneficial impact on the calibration performance of the GAM approaches. The average ranking of the GAM approaches including the monotonicity constraint is always better than their original GAM counterparts (i.e. GAMdf versus GAMdf MONO and GAMgcv versus GAMgcv MONO). Moreover, the automatic smoothness parameter selection procedure proves its beneficial impact. For the non-monotonicity models, GAMgcv has always a better ranking than the GAMdf approaches. For the monotonicity models, GAMgcv MONO performs always better than GAM3 MONO and GAM4 MONO, while GAMgcv MONO is very competitive to GAM5 MONO. 6. Discussion The results suggest that marketing analysts should calibrate the posterior probabilities when the training set does not represent the true prior distribution. In general, calibrating the posterior probabilities is more beneficial than using the non-calibrated posterior probabilities. Moreover, it is shown that a simple rescaling algorithm (SAERENS) that takes into account the ratio of the old and the new priors is not sufficient to be a first and workable solution to initially solve the calibration problem. SAERENS always performs significantly worse than the more complex non-linear probability-mapping approaches. Furthermore, marketing researchers should better not apply the linear probability-mapping approach in this specific setting. Indeed, among the different probability-mapping approaches, it has been shown that non-linear approaches are preferable over the linear mappings. The LOG approach is competitive to the more complex GAM-based calibration approaches, and because it is based on the common generalized linear model framework, LOG could be seen as a first and workable approach. However if one is interested to optimize the calibration performance, the GAM-based approaches are preferable. Moreover, one concludes that using the automatic smoothing parameter selection procedure and imposing a monotonicity constraint on the GAM method are the most preferred options to be employed in GAM models in order to optimize calibration performance. 7. Conclusion Direct marketing receives considerable attention these days in academia as well as in business due to a serious drop in the cost of IT equipment and the ever increasing usage of response models in a variety of business settings. In a direct marketing context, a discrepancy sometimes exists between the prior distributions on the training set and scoring set which is problematic. This may happen due to the fact that the training set consists entirely of customers previously selected by a response model, and thus this dataset consists of a higher percentage of responders. Applying a classification model built on this training set to the complete set of customers will harm the estimation of the response probabilities. Thoroughly adjusting the posterior probabilities to the real response probability distribution will improve the classification performance. This study reveals that the non-linear probabilitymapping approaches are among the best performing algorithms and their usage is highly recommended in a day to day business setting for following reasons. Firstly, the non-linear probabilitymapping approaches deliver a better performance compared to the other calibration algorithms included in this research paper. This leads to the fact that the calibrated probabilities better reflect the true probabilities of response. Secondly, there is a possibility to visualize the relationship between P kborg and P kbreal. This gives managers a better and visual understanding of the calibration process for a particular setting. For instance, the more the calibration curve is away from the 45 line (i.e. the line where P kborg = P kbreal or no calibration is necessary), the higher the added value of sending a leaflet because the incidence in TRAIN M is higher than in REAL N. Finally, the underlying techniques like generalized linear models and generalized additive models are easily implementable in today s business environment due to the availability of the classifiers in traditional software packages like SAS and R. While we are confident that our study adds significant value to the literature, valuable directions for future research are identified. Beside the probability-mapping approaches which map the P kborg onto the P kbreal, an extensive research project could be dedicated to investigate the impact of integrated calibration approaches, i.e. methods that integrate the calibration process into the initial training phase of classifier C in order to come up with a new classifier C 0 which directly outputs calibrated probabilities. For instance, a workable integrated calibration approach could be represented by a two-stage Bayesian logistic regression approach that directly outputs calibrated posterior probabilities. In order to obtain this integrated Bayesian calibration model, the following procedure is proposed. Under the assumption that the commonly-used prior distribution for b ki is multivariate Gaussian, i.e. p(b ki ) N(b 0, P 0), the Bayesian empirical approach could be used to specify the values of b 0 and P 0 by fitting a Bayesian logistic regression to TRAIN km using non-informative priors. Consequently,

7 738 K. Coussement, W. Buckinx / European Journal of Operational Research 214 (2011) the resulting posterior mean vector and variance covariance matrix of this initial model could then be used for the values of b 0 and P 0 for the second Bayesian logistic regression on REAL kn. The resulting integrated Bayesian logistic regression approach C will directly output adapted, calibrated posterior probabilities. 1 Furthermore, the probability-mapping approaches are validated in a direct marketing setting, whereas future research efforts could be spent to investigate the external validity to other operational research settings. Acknowledgements The authors would like to thank the anonymous company for freely distributing the datasets. We would like to thank our friendly and journal reviewers for their fruitful comments on earlier versions of this paper and the editor, Jesus Artalejo, for guiding this paper through the reviewing process. References Allenby, G.M., Leone, R.P., Jen, L.C., A dynamic model of purchase timing with application to direct marketing. Journal of the American Statistical Association 94, Baesens, B., Viaene, S., Van den Poel, D., Vanthienen, J., Dedene, G., Bayesian neural network learning for repeat purchase modeling in direct marketing. European Journal of Operational Research 138, Baumgartner, B., Hruschka, H., Allocation of catalogs to collective customers based on semiparametric response models. European Journal of Operational Research 162, Bose, I., Chen, X., Quantitative models for direct marketing: A review from systems perspective. European Journal of Operational Research 195, Bosio, S., Righini, G., Computational approaches to a combinatorial optimization problem arising from text classification. Computers and Operations Research 34, Bult, J.R., Semiparametric versus parametric classification models: An application to direct marketing. Journal of Marketing Research 30, Conforti, D., Guido, R., Kernel based support vector machine via semidefinite programming: Application to medical diagnosis. Computers and Operations Research 37, Crone, S.F., Lessmann, S., Stahlbock, R., The impact of preprocessing on data mining: An evaluation of classifier sensitivity in direct marketing. European Journal of Operational Research 173, Deichmann, J., Eshghi, A., Haughton, D., Sayek, S., Teebagy, N., Application of multiple adaptive regression splines (MARS) in direct response modeling. Journal of Interactive Marketing 16, Demšar, J., Statistical comparisons of classifiers over multiple data sets. Journal of Machine Learning Research 7, Denlinger, C.G., Elements of Real Analysis. Jones and Bartlett Publishers. Dunn, O.J., Multiple comparisons among means. Journal of the American Statistical Association 56, Friedman, M., The use of ranks to avoid the assumption of normality implicit in the analysis of variance. Journal of the American Statistical Association 32, Friedman, M., A comparison of alternative tests of significance for the problem of m rankings. The Annals of Mathematical Statistics 11, Green, P.J., Silverman, B.W., Nonparametric Regression and Generalized Linear Models. Chapman and Hall/CRC Press. Gu, C., Wahba, G., Minimizing GCV/GML scores with multiple smoothing parameters via the Newton method. SIAM Journal of Scientific and Statistical Computing 12, Hastie, T., Tibshirani, R., Generalized additive models. Statistical Science 1, Hastie, T., Tibshirani, R., Generalized Additive Models: Some applications. Journal of the American Statistical Association 82, Hastie, T., Tibshirani, R., Generalized Additive Models. Chapman and Hall, London. Hastie, T., Tibshirani, R., Friedman, J., The Elements of Statistical Learning: Data Mining, Inference and Prediction. Springer, New York. Haughton, D., Oulabi, S., Direct marketing modeling with CART and CHAID. Journal of Direct Marketing 11, Hruschka, H., Considering endogeneity for optimal catalog allocation in direct marketing. European Journal of Operational Research 206, Kim, H.S., Sohn, S.Y., Support vector machines for default prediction of SMEs based on technology credit. European Journal of Operational Research 201, Lamb, C.W., Hair, J.F., McDaniel, C., Principles of Marketing, second ed. South- Western Publishing Co., Cincinnati. Lee, H.J., Shin, H., Hwang, S.S., Cho, S., MacLachlan, D., Semi-Supervised Response Modeling. Journal of Interactive Marketing 24, Martens, D., Baesens, B., Van Gestel, T., Vanthienen, J., Comprehensible credit scoring models using rule extraction from support vector machines. European Journal of Operational Research 183, Martens, D., Van Gestel, T., De Backer, M., Haesen, R., Vanthienen, J., Baesens, B., Credit rating prediction using Ant Colony Optimization. Journal of the Operational Research Society 61, Morales, D.R., Wang, J.B., Forecasting cancellation rates for services booking revenue management using data mining. European Journal of Operational Research 202, Naik, P.A., Hagerty, M.R., Tsai, C.L., A new dimension reduction approach for data-rich marketing environments: Sliced inverse regression. Journal of Marketing Research 37, Neslin, S.A., Gupta, S., Kamakura, W., Lu, J.X., Mason, C.H., Defection detection: Measuring and understanding the predictive accuracy of customer churn models. Journal of Marketing Research 43, Paleologo, G., Elisseeff, A., Antonini, G., Subagging for credit scoring models. European Journal of Operational Research 201, Piersma, N., Jonker, J.J., Determining the optimal direct mailing frequency. European Journal of Operational Research 158, Saerens, M., Latinne, P., Decaestecker, C., Adjusting the outputs of a classifier to new a priori probabilities: A simple procedure. Neural Computation 14, Tabachnick, B.G., Fidell, L.S., Using Multivariate Statistics. Harper Collings Publishers, New York. Wahba, G., Spline models for observational data. Society for Industrial and Applied Mathematics (SIAM), Capital City Press, Montpelier, Vermont. Wood, S.N., Modelling and smoothing parameter estimation with multiple quadratic penalties. Journal of the Royal Statistical Society B 62, Wood, S.N., Stable and efficient multiple smoothing parameter estimation for generalized additive models. Journal of the American Statistical Association 99, Wood, S.N., Fast stable direct fitting and smoothness selection for generalized additive models. Journal of the Royal Statistical Society B 70, Zahavi, J., Levin, N., Applying neural computing to target marketing. Journal of Direct Marketing 11, Nevertheless, this approach is not tested in the current version of the paper for confidentiality reasons.

Least Squares Estimation

Least Squares Estimation Least Squares Estimation SARA A VAN DE GEER Volume 2, pp 1041 1045 in Encyclopedia of Statistics in Behavioral Science ISBN-13: 978-0-470-86080-9 ISBN-10: 0-470-86080-4 Editors Brian S Everitt & David

More information

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not. Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation: - Feature vector X, - qualitative response Y, taking values in C

More information

Monotonicity Hints. Abstract

Monotonicity Hints. Abstract Monotonicity Hints Joseph Sill Computation and Neural Systems program California Institute of Technology email: [email protected] Yaser S. Abu-Mostafa EE and CS Deptartments California Institute of Technology

More information

Penalized Logistic Regression and Classification of Microarray Data

Penalized Logistic Regression and Classification of Microarray Data Penalized Logistic Regression and Classification of Microarray Data Milan, May 2003 Anestis Antoniadis Laboratoire IMAG-LMC University Joseph Fourier Grenoble, France Penalized Logistic Regression andclassification

More information

Data Mining - Evaluation of Classifiers

Data Mining - Evaluation of Classifiers Data Mining - Evaluation of Classifiers Lecturer: JERZY STEFANOWSKI Institute of Computing Sciences Poznan University of Technology Poznan, Poland Lecture 4 SE Master Course 2008/2009 revised for 2010

More information

Cross Validation. Dr. Thomas Jensen Expedia.com

Cross Validation. Dr. Thomas Jensen Expedia.com Cross Validation Dr. Thomas Jensen Expedia.com About Me PhD from ETH Used to be a statistician at Link, now Senior Business Analyst at Expedia Manage a database with 720,000 Hotels that are not on contract

More information

Advanced Database Marketing Innovative Methodologies and Applications for Managing Customer Relationships

Advanced Database Marketing Innovative Methodologies and Applications for Managing Customer Relationships Advanced Database Marketing Innovative Methodologies and Applications for Managing Customer Relationships Edited by KRISTOF COUSSEMENT KOEN W. DE BOCK and SCOTT A. NESLIN GOWER Contents List of Figures

More information

Prediction of Stock Performance Using Analytical Techniques

Prediction of Stock Performance Using Analytical Techniques 136 JOURNAL OF EMERGING TECHNOLOGIES IN WEB INTELLIGENCE, VOL. 5, NO. 2, MAY 2013 Prediction of Stock Performance Using Analytical Techniques Carol Hargreaves Institute of Systems Science National University

More information

THE HYBRID CART-LOGIT MODEL IN CLASSIFICATION AND DATA MINING. Dan Steinberg and N. Scott Cardell

THE HYBRID CART-LOGIT MODEL IN CLASSIFICATION AND DATA MINING. Dan Steinberg and N. Scott Cardell THE HYBID CAT-LOGIT MODEL IN CLASSIFICATION AND DATA MINING Introduction Dan Steinberg and N. Scott Cardell Most data-mining projects involve classification problems assigning objects to classes whether

More information

Adequacy of Biomath. Models. Empirical Modeling Tools. Bayesian Modeling. Model Uncertainty / Selection

Adequacy of Biomath. Models. Empirical Modeling Tools. Bayesian Modeling. Model Uncertainty / Selection Directions in Statistical Methodology for Multivariable Predictive Modeling Frank E Harrell Jr University of Virginia Seattle WA 19May98 Overview of Modeling Process Model selection Regression shape Diagnostics

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! [email protected]! http://www.cs.toronto.edu/~rsalakhu/ Lecture 6 Three Approaches to Classification Construct

More information

STATISTICA Formula Guide: Logistic Regression. Table of Contents

STATISTICA Formula Guide: Logistic Regression. Table of Contents : Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary

More information

Model Validation Techniques

Model Validation Techniques Model Validation Techniques Kevin Mahoney, FCAS kmahoney@ travelers.com CAS RPM Seminar March 17, 2010 Uses of Statistical Models in P/C Insurance Examples of Applications Determine expected loss cost

More information

REPORT DOCUMENTATION PAGE

REPORT DOCUMENTATION PAGE REPORT DOCUMENTATION PAGE Form Approved OMB NO. 0704-0188 Public Reporting burden for this collection of information is estimated to average 1 hour per response, including the time for reviewing instructions,

More information

Ira J. Haimowitz Henry Schwarz

Ira J. Haimowitz Henry Schwarz From: AAAI Technical Report WS-97-07. Compilation copyright 1997, AAAI (www.aaai.org). All rights reserved. Clustering and Prediction for Credit Line Optimization Ira J. Haimowitz Henry Schwarz General

More information

Local classification and local likelihoods

Local classification and local likelihoods Local classification and local likelihoods November 18 k-nearest neighbors The idea of local regression can be extended to classification as well The simplest way of doing so is called nearest neighbor

More information

Comparing the Results of Support Vector Machines with Traditional Data Mining Algorithms

Comparing the Results of Support Vector Machines with Traditional Data Mining Algorithms Comparing the Results of Support Vector Machines with Traditional Data Mining Algorithms Scott Pion and Lutz Hamel Abstract This paper presents the results of a series of analyses performed on direct mail

More information

Benchmarking of different classes of models used for credit scoring

Benchmarking of different classes of models used for credit scoring Benchmarking of different classes of models used for credit scoring We use this competition as an opportunity to compare the performance of different classes of predictive models. In particular we want

More information

Predictive Modeling Techniques in Insurance

Predictive Modeling Techniques in Insurance Predictive Modeling Techniques in Insurance Tuesday May 5, 2015 JF. Breton Application Engineer 2014 The MathWorks, Inc. 1 Opening Presenter: JF. Breton: 13 years of experience in predictive analytics

More information

Expert Systems with Applications

Expert Systems with Applications Expert Systems with Applications 36 (2009) 5445 5449 Contents lists available at ScienceDirect Expert Systems with Applications journal homepage: www.elsevier.com/locate/eswa Customer churn prediction

More information

Regression Modeling Strategies

Regression Modeling Strategies Frank E. Harrell, Jr. Regression Modeling Strategies With Applications to Linear Models, Logistic Regression, and Survival Analysis With 141 Figures Springer Contents Preface Typographical Conventions

More information

Expert Systems with Applications

Expert Systems with Applications Expert Systems with Applications 36 (2009) 4626 4636 Contents lists available at ScienceDirect Expert Systems with Applications journal homepage: www.elsevier.com/locate/eswa Handling class imbalance in

More information

USING LOGIT MODEL TO PREDICT CREDIT SCORE

USING LOGIT MODEL TO PREDICT CREDIT SCORE USING LOGIT MODEL TO PREDICT CREDIT SCORE Taiwo Amoo, Associate Professor of Business Statistics and Operation Management, Brooklyn College, City University of New York, (718) 951-5219, [email protected]

More information

Gerry Hobbs, Department of Statistics, West Virginia University

Gerry Hobbs, Department of Statistics, West Virginia University Decision Trees as a Predictive Modeling Method Gerry Hobbs, Department of Statistics, West Virginia University Abstract Predictive modeling has become an important area of interest in tasks such as credit

More information

SAS Software to Fit the Generalized Linear Model

SAS Software to Fit the Generalized Linear Model SAS Software to Fit the Generalized Linear Model Gordon Johnston, SAS Institute Inc., Cary, NC Abstract In recent years, the class of generalized linear models has gained popularity as a statistical modeling

More information

Predict Influencers in the Social Network

Predict Influencers in the Social Network Predict Influencers in the Social Network Ruishan Liu, Yang Zhao and Liuyu Zhou Email: rliu2, yzhao2, [email protected] Department of Electrical Engineering, Stanford University Abstract Given two persons

More information

Insurance Analytics - analýza dat a prediktivní modelování v pojišťovnictví. Pavel Kříž. Seminář z aktuárských věd MFF 4.

Insurance Analytics - analýza dat a prediktivní modelování v pojišťovnictví. Pavel Kříž. Seminář z aktuárských věd MFF 4. Insurance Analytics - analýza dat a prediktivní modelování v pojišťovnictví Pavel Kříž Seminář z aktuárských věd MFF 4. dubna 2014 Summary 1. Application areas of Insurance Analytics 2. Insurance Analytics

More information

BOOSTED REGRESSION TREES: A MODERN WAY TO ENHANCE ACTUARIAL MODELLING

BOOSTED REGRESSION TREES: A MODERN WAY TO ENHANCE ACTUARIAL MODELLING BOOSTED REGRESSION TREES: A MODERN WAY TO ENHANCE ACTUARIAL MODELLING Xavier Conort [email protected] Session Number: TBR14 Insurance has always been a data business The industry has successfully

More information

Publication List. Chen Zehua Department of Statistics & Applied Probability National University of Singapore

Publication List. Chen Zehua Department of Statistics & Applied Probability National University of Singapore Publication List Chen Zehua Department of Statistics & Applied Probability National University of Singapore Publications Journal Papers 1. Y. He and Z. Chen (2014). A sequential procedure for feature selection

More information

Lecture 3: Linear methods for classification

Lecture 3: Linear methods for classification Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,

More information

Generalizing Random Forests Principles to other Methods: Random MultiNomial Logit, Random Naive Bayes, Anita Prinzie & Dirk Van den Poel

Generalizing Random Forests Principles to other Methods: Random MultiNomial Logit, Random Naive Bayes, Anita Prinzie & Dirk Van den Poel Generalizing Random Forests Principles to other Methods: Random MultiNomial Logit, Random Naive Bayes, Anita Prinzie & Dirk Van den Poel Copyright 2008 All rights reserved. Random Forests Forest of decision

More information

Christfried Webers. Canberra February June 2015

Christfried Webers. Canberra February June 2015 c Statistical Group and College of Engineering and Computer Science Canberra February June (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 829 c Part VIII Linear Classification 2 Logistic

More information

Paper AA-08-2015. Get the highest bangs for your marketing bucks using Incremental Response Models in SAS Enterprise Miner TM

Paper AA-08-2015. Get the highest bangs for your marketing bucks using Incremental Response Models in SAS Enterprise Miner TM Paper AA-08-2015 Get the highest bangs for your marketing bucks using Incremental Response Models in SAS Enterprise Miner TM Delali Agbenyegah, Alliance Data Systems, Columbus, Ohio 0.0 ABSTRACT Traditional

More information

Classification of Bad Accounts in Credit Card Industry

Classification of Bad Accounts in Credit Card Industry Classification of Bad Accounts in Credit Card Industry Chengwei Yuan December 12, 2014 Introduction Risk management is critical for a credit card company to survive in such competing industry. In addition

More information

Linear Classification. Volker Tresp Summer 2015

Linear Classification. Volker Tresp Summer 2015 Linear Classification Volker Tresp Summer 2015 1 Classification Classification is the central task of pattern recognition Sensors supply information about an object: to which class do the object belong

More information

Overview Classes. 12-3 Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7)

Overview Classes. 12-3 Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7) Overview Classes 12-3 Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7) 2-4 Loglinear models (8) 5-4 15-17 hrs; 5B02 Building and

More information

Question 2 Naïve Bayes (16 points)

Question 2 Naïve Bayes (16 points) Question 2 Naïve Bayes (16 points) About 2/3 of your email is spam so you downloaded an open source spam filter based on word occurrences that uses the Naive Bayes classifier. Assume you collected the

More information

Data Mining Algorithms Part 1. Dejan Sarka

Data Mining Algorithms Part 1. Dejan Sarka Data Mining Algorithms Part 1 Dejan Sarka Join the conversation on Twitter: @DevWeek #DW2015 Instructor Bio Dejan Sarka ([email protected]) 30 years of experience SQL Server MVP, MCT, 13 books 7+ courses

More information

Data Mining Framework for Direct Marketing: A Case Study of Bank Marketing

Data Mining Framework for Direct Marketing: A Case Study of Bank Marketing www.ijcsi.org 198 Data Mining Framework for Direct Marketing: A Case Study of Bank Marketing Lilian Sing oei 1 and Jiayang Wang 2 1 School of Information Science and Engineering, Central South University

More information

Data Mining. Nonlinear Classification

Data Mining. Nonlinear Classification Data Mining Unit # 6 Sajjad Haider Fall 2014 1 Nonlinear Classification Classes may not be separable by a linear boundary Suppose we randomly generate a data set as follows: X has range between 0 to 15

More information

Statistical issues in the analysis of microarray data

Statistical issues in the analysis of microarray data Statistical issues in the analysis of microarray data Daniel Gerhard Institute of Biostatistics Leibniz University of Hannover ESNATS Summerschool, Zermatt D. Gerhard (LUH) Analysis of microarray data

More information

D-optimal plans in observational studies

D-optimal plans in observational studies D-optimal plans in observational studies Constanze Pumplün Stefan Rüping Katharina Morik Claus Weihs October 11, 2005 Abstract This paper investigates the use of Design of Experiments in observational

More information

Introduction to General and Generalized Linear Models

Introduction to General and Generalized Linear Models Introduction to General and Generalized Linear Models General Linear Models - part I Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs. Lyngby

More information

The Optimality of Naive Bayes

The Optimality of Naive Bayes The Optimality of Naive Bayes Harry Zhang Faculty of Computer Science University of New Brunswick Fredericton, New Brunswick, Canada email: hzhang@unbca E3B 5A3 Abstract Naive Bayes is one of the most

More information

Standardization and Its Effects on K-Means Clustering Algorithm

Standardization and Its Effects on K-Means Clustering Algorithm Research Journal of Applied Sciences, Engineering and Technology 6(7): 399-3303, 03 ISSN: 040-7459; e-issn: 040-7467 Maxwell Scientific Organization, 03 Submitted: January 3, 03 Accepted: February 5, 03

More information

The Data Mining Process

The Data Mining Process Sequence for Determining Necessary Data. Wrong: Catalog everything you have, and decide what data is important. Right: Work backward from the solution, define the problem explicitly, and map out the data

More information

Statistical Machine Learning

Statistical Machine Learning Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes

More information

Machine Learning in FX Carry Basket Prediction

Machine Learning in FX Carry Basket Prediction Machine Learning in FX Carry Basket Prediction Tristan Fletcher, Fabian Redpath and Joe D Alessandro Abstract Artificial Neural Networks ANN), Support Vector Machines SVM) and Relevance Vector Machines

More information

Cross-Validation. Synonyms Rotation estimation

Cross-Validation. Synonyms Rotation estimation Comp. by: BVijayalakshmiGalleys0000875816 Date:6/11/08 Time:19:52:53 Stage:First Proof C PAYAM REFAEILZADEH, LEI TANG, HUAN LIU Arizona State University Synonyms Rotation estimation Definition is a statistical

More information

Enhanced Boosted Trees Technique for Customer Churn Prediction Model

Enhanced Boosted Trees Technique for Customer Churn Prediction Model IOSR Journal of Engineering (IOSRJEN) ISSN (e): 2250-3021, ISSN (p): 2278-8719 Vol. 04, Issue 03 (March. 2014), V5 PP 41-45 www.iosrjen.org Enhanced Boosted Trees Technique for Customer Churn Prediction

More information

Detecting Corporate Fraud: An Application of Machine Learning

Detecting Corporate Fraud: An Application of Machine Learning Detecting Corporate Fraud: An Application of Machine Learning Ophir Gottlieb, Curt Salisbury, Howard Shek, Vishal Vaidyanathan December 15, 2006 ABSTRACT This paper explores the application of several

More information

MACHINE LEARNING IN HIGH ENERGY PHYSICS

MACHINE LEARNING IN HIGH ENERGY PHYSICS MACHINE LEARNING IN HIGH ENERGY PHYSICS LECTURE #1 Alex Rogozhnikov, 2015 INTRO NOTES 4 days two lectures, two practice seminars every day this is introductory track to machine learning kaggle competition!

More information

1 Maximum likelihood estimation

1 Maximum likelihood estimation COS 424: Interacting with Data Lecturer: David Blei Lecture #4 Scribes: Wei Ho, Michael Ye February 14, 2008 1 Maximum likelihood estimation 1.1 MLE of a Bernoulli random variable (coin flips) Given N

More information

Data quality in Accounting Information Systems

Data quality in Accounting Information Systems Data quality in Accounting Information Systems Comparing Several Data Mining Techniques Erjon Zoto Department of Statistics and Applied Informatics Faculty of Economy, University of Tirana Tirana, Albania

More information

DISCRIMINANT FUNCTION ANALYSIS (DA)

DISCRIMINANT FUNCTION ANALYSIS (DA) DISCRIMINANT FUNCTION ANALYSIS (DA) John Poulsen and Aaron French Key words: assumptions, further reading, computations, standardized coefficents, structure matrix, tests of signficance Introduction Discriminant

More information

Classification Problems

Classification Problems Classification Read Chapter 4 in the text by Bishop, except omit Sections 4.1.6, 4.1.7, 4.2.4, 4.3.3, 4.3.5, 4.3.6, 4.4, and 4.5. Also, review sections 1.5.1, 1.5.2, 1.5.3, and 1.5.4. Classification Problems

More information

Including the Salesperson Effect in Purchasing Behavior Models Using PROC GLIMMIX

Including the Salesperson Effect in Purchasing Behavior Models Using PROC GLIMMIX Paper 350-2012 Including the Salesperson Effect in Purchasing Behavior Models Using PROC GLIMMIX Philippe Baecke Faculty of Economics and Business Administration, Department of Marketing, Ghent University,

More information

APPLICATION OF DATA MINING TECHNIQUES FOR DIRECT MARKETING. Anatoli Nachev

APPLICATION OF DATA MINING TECHNIQUES FOR DIRECT MARKETING. Anatoli Nachev 86 ITHEA APPLICATION OF DATA MINING TECHNIQUES FOR DIRECT MARKETING Anatoli Nachev Abstract: This paper presents a case study of data mining modeling techniques for direct marketing. It focuses to three

More information

Active Learning SVM for Blogs recommendation

Active Learning SVM for Blogs recommendation Active Learning SVM for Blogs recommendation Xin Guan Computer Science, George Mason University Ⅰ.Introduction In the DH Now website, they try to review a big amount of blogs and articles and find the

More information

Speaker First Plenary Session THE USE OF "BIG DATA" - WHERE ARE WE AND WHAT DOES THE FUTURE HOLD? William H. Crown, PhD

Speaker First Plenary Session THE USE OF BIG DATA - WHERE ARE WE AND WHAT DOES THE FUTURE HOLD? William H. Crown, PhD Speaker First Plenary Session THE USE OF "BIG DATA" - WHERE ARE WE AND WHAT DOES THE FUTURE HOLD? William H. Crown, PhD Optum Labs Cambridge, MA, USA Statistical Methods and Machine Learning ISPOR International

More information

Class #6: Non-linear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris

Class #6: Non-linear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris Class #6: Non-linear classification ML4Bio 2012 February 17 th, 2012 Quaid Morris 1 Module #: Title of Module 2 Review Overview Linear separability Non-linear classification Linear Support Vector Machines

More information

The Probit Link Function in Generalized Linear Models for Data Mining Applications

The Probit Link Function in Generalized Linear Models for Data Mining Applications Journal of Modern Applied Statistical Methods Copyright 2013 JMASM, Inc. May 2013, Vol. 12, No. 1, 164-169 1538 9472/13/$95.00 The Probit Link Function in Generalized Linear Models for Data Mining Applications

More information

Linear Threshold Units

Linear Threshold Units Linear Threshold Units w x hx (... w n x n w We assume that each feature x j and each weight w j is a real number (we will relax this later) We will study three different algorithms for learning linear

More information

Linear Models and Conjoint Analysis with Nonlinear Spline Transformations

Linear Models and Conjoint Analysis with Nonlinear Spline Transformations Linear Models and Conjoint Analysis with Nonlinear Spline Transformations Warren F. Kuhfeld Mark Garratt Abstract Many common data analysis models are based on the general linear univariate model, including

More information

An Overview of Knowledge Discovery Database and Data mining Techniques

An Overview of Knowledge Discovery Database and Data mining Techniques An Overview of Knowledge Discovery Database and Data mining Techniques Priyadharsini.C 1, Dr. Antony Selvadoss Thanamani 2 M.Phil, Department of Computer Science, NGM College, Pollachi, Coimbatore, Tamilnadu,

More information

Fitting Subject-specific Curves to Grouped Longitudinal Data

Fitting Subject-specific Curves to Grouped Longitudinal Data Fitting Subject-specific Curves to Grouped Longitudinal Data Djeundje, Viani Heriot-Watt University, Department of Actuarial Mathematics & Statistics Edinburgh, EH14 4AS, UK E-mail: [email protected] Currie,

More information

A DUAL-STEP MULTI-ALGORITHM APPROACH FOR CHURN PREDICTION IN PRE-PAID TELECOMMUNICATIONS SERVICE PROVIDERS

A DUAL-STEP MULTI-ALGORITHM APPROACH FOR CHURN PREDICTION IN PRE-PAID TELECOMMUNICATIONS SERVICE PROVIDERS TL 004 A DUAL-STEP MULTI-ALGORITHM APPROACH FOR CHURN PREDICTION IN PRE-PAID TELECOMMUNICATIONS SERVICE PROVIDERS ALI TAMADDONI JAHROMI, MEHRAD MOEINI, ISSAR AKBARI, ARAM AKBARZADEH DEPARTMENT OF INDUSTRIAL

More information

Regression III: Advanced Methods

Regression III: Advanced Methods Lecture 16: Generalized Additive Models Regression III: Advanced Methods Bill Jacoby Michigan State University http://polisci.msu.edu/jacoby/icpsr/regress3 Goals of the Lecture Introduce Additive Models

More information

Logistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression

Logistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression Logistic Regression Department of Statistics The Pennsylvania State University Email: [email protected] Logistic Regression Preserve linear classification boundaries. By the Bayes rule: Ĝ(x) = arg max

More information

TOWARD BIG DATA ANALYSIS WORKSHOP

TOWARD BIG DATA ANALYSIS WORKSHOP TOWARD BIG DATA ANALYSIS WORKSHOP 邁 向 巨 量 資 料 分 析 研 討 會 摘 要 集 2015.06.05-06 巨 量 資 料 之 矩 陣 視 覺 化 陳 君 厚 中 央 研 究 院 統 計 科 學 研 究 所 摘 要 視 覺 化 (Visualization) 與 探 索 式 資 料 分 析 (Exploratory Data Analysis, EDA)

More information

MERGING BUSINESS KPIs WITH PREDICTIVE MODEL KPIs FOR BINARY CLASSIFICATION MODEL SELECTION

MERGING BUSINESS KPIs WITH PREDICTIVE MODEL KPIs FOR BINARY CLASSIFICATION MODEL SELECTION MERGING BUSINESS KPIs WITH PREDICTIVE MODEL KPIs FOR BINARY CLASSIFICATION MODEL SELECTION Matthew A. Lanham & Ralph D. Badinelli Virginia Polytechnic Institute and State University Department of Business

More information

These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop

These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop Music and Machine Learning (IFT6080 Winter 08) Prof. Douglas Eck, Université de Montréal These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher

More information

Statistics in Retail Finance. Chapter 6: Behavioural models

Statistics in Retail Finance. Chapter 6: Behavioural models Statistics in Retail Finance 1 Overview > So far we have focussed mainly on application scorecards. In this chapter we shall look at behavioural models. We shall cover the following topics:- Behavioural

More information

Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus

Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus Tihomir Asparouhov and Bengt Muthén Mplus Web Notes: No. 15 Version 8, August 5, 2014 1 Abstract This paper discusses alternatives

More information

Optimization of technical trading strategies and the profitability in security markets

Optimization of technical trading strategies and the profitability in security markets Economics Letters 59 (1998) 249 254 Optimization of technical trading strategies and the profitability in security markets Ramazan Gençay 1, * University of Windsor, Department of Economics, 401 Sunset,

More information

Data Mining and Data Warehousing. Henryk Maciejewski. Data Mining Predictive modelling: regression

Data Mining and Data Warehousing. Henryk Maciejewski. Data Mining Predictive modelling: regression Data Mining and Data Warehousing Henryk Maciejewski Data Mining Predictive modelling: regression Algorithms for Predictive Modelling Contents Regression Classification Auxiliary topics: Estimation of prediction

More information

Package acrm. R topics documented: February 19, 2015

Package acrm. R topics documented: February 19, 2015 Package acrm February 19, 2015 Type Package Title Convenience functions for analytical Customer Relationship Management Version 0.1.1 Date 2014-03-28 Imports dummies, randomforest, kernelfactory, ada Author

More information

An Introduction to Data Mining

An Introduction to Data Mining An Introduction to Intel Beijing [email protected] January 17, 2014 Outline 1 DW Overview What is Notable Application of Conference, Software and Applications Major Process in 2 Major Tasks in Detail

More information

On the effect of data set size on bias and variance in classification learning

On the effect of data set size on bias and variance in classification learning On the effect of data set size on bias and variance in classification learning Abstract Damien Brain Geoffrey I Webb School of Computing and Mathematics Deakin University Geelong Vic 3217 With the advent

More information

Smoothing and Non-Parametric Regression

Smoothing and Non-Parametric Regression Smoothing and Non-Parametric Regression Germán Rodríguez [email protected] Spring, 2001 Objective: to estimate the effects of covariates X on a response y nonparametrically, letting the data suggest

More information

Applied Data Mining Analysis: A Step-by-Step Introduction Using Real-World Data Sets

Applied Data Mining Analysis: A Step-by-Step Introduction Using Real-World Data Sets Applied Data Mining Analysis: A Step-by-Step Introduction Using Real-World Data Sets http://info.salford-systems.com/jsm-2015-ctw August 2015 Salford Systems Course Outline Demonstration of two classification

More information

11 Linear and Quadratic Discriminant Analysis, Logistic Regression, and Partial Least Squares Regression

11 Linear and Quadratic Discriminant Analysis, Logistic Regression, and Partial Least Squares Regression Frank C Porter and Ilya Narsky: Statistical Analysis Techniques in Particle Physics Chap. c11 2013/9/9 page 221 le-tex 221 11 Linear and Quadratic Discriminant Analysis, Logistic Regression, and Partial

More information

Using reporting and data mining techniques to improve knowledge of subscribers; applications to customer profiling and fraud management

Using reporting and data mining techniques to improve knowledge of subscribers; applications to customer profiling and fraud management Using reporting and data mining techniques to improve knowledge of subscribers; applications to customer profiling and fraud management Paper Jean-Louis Amat Abstract One of the main issues of operators

More information

Location matters. 3 techniques to incorporate geo-spatial effects in one's predictive model

Location matters. 3 techniques to incorporate geo-spatial effects in one's predictive model Location matters. 3 techniques to incorporate geo-spatial effects in one's predictive model Xavier Conort [email protected] Motivation Location matters! Observed value at one location is

More information

SPSS TRAINING SESSION 3 ADVANCED TOPICS (PASW STATISTICS 17.0) Sun Li Centre for Academic Computing [email protected]

SPSS TRAINING SESSION 3 ADVANCED TOPICS (PASW STATISTICS 17.0) Sun Li Centre for Academic Computing lsun@smu.edu.sg SPSS TRAINING SESSION 3 ADVANCED TOPICS (PASW STATISTICS 17.0) Sun Li Centre for Academic Computing [email protected] IN SPSS SESSION 2, WE HAVE LEARNT: Elementary Data Analysis Group Comparison & One-way

More information

Principle Component Analysis and Partial Least Squares: Two Dimension Reduction Techniques for Regression

Principle Component Analysis and Partial Least Squares: Two Dimension Reduction Techniques for Regression Principle Component Analysis and Partial Least Squares: Two Dimension Reduction Techniques for Regression Saikat Maitra and Jun Yan Abstract: Dimension reduction is one of the major tasks for multivariate

More information

Data Mining Methods: Applications for Institutional Research

Data Mining Methods: Applications for Institutional Research Data Mining Methods: Applications for Institutional Research Nora Galambos, PhD Office of Institutional Research, Planning & Effectiveness Stony Brook University NEAIR Annual Conference Philadelphia 2014

More information

Studying Auto Insurance Data

Studying Auto Insurance Data Studying Auto Insurance Data Ashutosh Nandeshwar February 23, 2010 1 Introduction To study auto insurance data using traditional and non-traditional tools, I downloaded a well-studied data from http://www.statsci.org/data/general/motorins.

More information

European Journal of Operational Research

European Journal of Operational Research European Journal of Operational Research 197 (2009) 402 411 Contents lists available at ScienceDirect European Journal of Operational Research journal homepage: www.elsevier.com/locate/ejor Interfaces

More information

Machine Learning in Spam Filtering

Machine Learning in Spam Filtering Machine Learning in Spam Filtering A Crash Course in ML Konstantin Tretyakov [email protected] Institute of Computer Science, University of Tartu Overview Spam is Evil ML for Spam Filtering: General Idea, Problems.

More information

Handling attrition and non-response in longitudinal data

Handling attrition and non-response in longitudinal data Longitudinal and Life Course Studies 2009 Volume 1 Issue 1 Pp 63-72 Handling attrition and non-response in longitudinal data Harvey Goldstein University of Bristol Correspondence. Professor H. Goldstein

More information

Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data

Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data CMPE 59H Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data Term Project Report Fatma Güney, Kübra Kalkan 1/15/2013 Keywords: Non-linear

More information

ON INTEGRATING UNSUPERVISED AND SUPERVISED CLASSIFICATION FOR CREDIT RISK EVALUATION

ON INTEGRATING UNSUPERVISED AND SUPERVISED CLASSIFICATION FOR CREDIT RISK EVALUATION ISSN 9 X INFORMATION TECHNOLOGY AND CONTROL, 00, Vol., No.A ON INTEGRATING UNSUPERVISED AND SUPERVISED CLASSIFICATION FOR CREDIT RISK EVALUATION Danuta Zakrzewska Institute of Computer Science, Technical

More information

Predicting Student Performance by Using Data Mining Methods for Classification

Predicting Student Performance by Using Data Mining Methods for Classification BULGARIAN ACADEMY OF SCIENCES CYBERNETICS AND INFORMATION TECHNOLOGIES Volume 13, No 1 Sofia 2013 Print ISSN: 1311-9702; Online ISSN: 1314-4081 DOI: 10.2478/cait-2013-0006 Predicting Student Performance

More information

Model selection in R featuring the lasso. Chris Franck LISA Short Course March 26, 2013

Model selection in R featuring the lasso. Chris Franck LISA Short Course March 26, 2013 Model selection in R featuring the lasso Chris Franck LISA Short Course March 26, 2013 Goals Overview of LISA Classic data example: prostate data (Stamey et. al) Brief review of regression and model selection.

More information

Efficiency in Software Development Projects

Efficiency in Software Development Projects Efficiency in Software Development Projects Aneesh Chinubhai Dharmsinh Desai University [email protected] Abstract A number of different factors are thought to influence the efficiency of the software

More information

Predict the Popularity of YouTube Videos Using Early View Data

Predict the Popularity of YouTube Videos Using Early View Data 000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050

More information

Bootstrapping Big Data

Bootstrapping Big Data Bootstrapping Big Data Ariel Kleiner Ameet Talwalkar Purnamrita Sarkar Michael I. Jordan Computer Science Division University of California, Berkeley {akleiner, ameet, psarkar, jordan}@eecs.berkeley.edu

More information