Customer Lifetime Value Measurement using Machine Learning Techniques. Tarun Rathi. Mathematics and Computing. Department of Mathematics

Size: px
Start display at page:

Download "Customer Lifetime Value Measurement using Machine Learning Techniques. Tarun Rathi. Mathematics and Computing. Department of Mathematics"

Transcription

1 1 Customer Lifetime Value Measurement using Machine Learning Techniques Tarun Rathi Mathematics and Computing Department of Mathematics Indian Institute of Technology (IIT), Kharagpur Project guide: Dr. V Ravi Associate Professor, IDRBT Institute of Development and Research in Banking Technology (IDRBT) Road No. 1, Castle Hills, Masab Tank, Hyderabad July 8, 2011

2 2 Certificate Date: July 8, 2011 This is to certify that the project Report entitled Customer Lifetime Value Measurement using Machine Learning Techniques submitted by Mr. TARUN RATHI, 3 rd year student in the Department of Mathematics, enrolled in its 5 year integrated MSc. course of Mathematics and Computing, Indian Institute of Technology, Kharagpur is a record of bonafide work carried out by him under my guidance during the period May 6, 2011 to July 8, 2011 at Institute for Development and Research in Banking Technology (IDRBT), Hyderabad. The project work is a research study, which has been successfully completed as per the set of objectives. I observed Mr. TARUN RATHI as sincere, hardworking and having capability and aptitude for independent research work. I wish him every success in his life. Dr. V Ravi Associate Professor, IDRBT Supervisor

3 3 Declaration by the candidate I declare that the summer internship project report entitled, Customer Lifetime Value Measurement using Machine Learning Techniques is my own work conducted under the supervision of Dr. V Ravi at Institute of Development and Research in Banking Technology, Hyderabad. I have put in 64 days of attendance with my supervisor at IDRBT and awarded project fellowship. I further declare that to the best of my knowledge the report does not contain any part of any work, which has been submitted for the award of any degree either by this institute or in any other university without proper citation. Tarun Rathi III yr. Undergraduate Student Department of Mathematics IIT Kharagpur July 8, 2011

4 1 Acknowledgement I would like to thank Mr B. Sambamurthy, director of IDRBT, for giving me this opportunity. I gratefully acknowledge the guidance from Dr. V. Ravi, who helped me sort out all the problems in concept clarifications; and without whose support, the project would not have reached its present state. I would also like to thank Mr. Naveen Nekuri for his guidance and sincere help in understanding important concepts and also in the development of the WNN software. Tarun Rathi III yr. Undergraduate Student Department of Mathematics IIT Kharagpur July 8, 2011

5 2 Abstract: Customer Lifetime Value (CLV) is an important metric in relationship marketing approaches. There have always been traditional techniques like Recency, Frequency and Monetary Value (RFM), Past Customer Value (PCV) and Share-of-Wallet (SOW) for segregation of customers into good or bad, but these are not adequate, as they only segment customers based on their past contribution. CLV on the other hand calculates the future value of a customer over his or her entire lifetime, which means it takes into account the prospect of a bad customer being good in future and hence profitable for a company or organisation. In this paper, we review the various models and different techniques used in the measurement of CLV. Towards the end we make a comparison of various machine learning techniques like Classification and Regression Trees (CART), Support Vector Machines (SVM), SVM using SMO, Additive Regression, K-Star Method, Multilayer Perceptron (MLP) and Wavelet Neural Network (WNN) for the calculation of CLV. Keywords : Customer lifetime value (CLV), RFM, Share-of-Wallet (SOW), Past Customer Value (PCV), machine learning techniques, Data mining, Support Vector Machines, Sequential Minimal Optimization (SMO), Additive Regression, K-star Method, Artificial Neural Networks (ANN), Multilayer Perceptron (MLP), Wavelet Neural Network (WNN).

6 3 Contents Certificate Declaration by the candidate Acknowledgement 1 Abstract 2 1. Introduction 4 2. Literature Review Aggregate Approach Individual Approach Models and Techniques to calculate CLV RFM Models Computer Science and Stochastic Models Growth/Diffusion Models Econometric Models Some other Modelling Approaches Estimating Future Customer Value using Machine Learning Techniques Data Description Models and Software Used SVM Additive Regression and K-Star MLP WNN CART Results and Comparison of Models Conclusion and Directions of future research 28 References 29

7 4 1. Introduction: Customer Lifetime Value has become a very important metric in Customer Relationship Management. Various firms are increasing relying on CLV to manage and measure their business. CLV is a disaggregate metric that can be used to find customers who can be profitable in future and hence be used allocate resources accordingly (Kumar and Reinartz, 2006). Besides, CLV of current and future customers is a also a good measure of overall value of a firm (Gupta, Lehmann and Stuart 2004). There have been other measures as well which are fairly good indicators of customer loyalty like Recency, Frequency and Monetary Value (RFM), Past Customer Value (PCV) and Share-of-Wallet (SOW). The customers who are more recent and have a high frequency and total monetary contribution are said to be the best customers in this approach. However, it is possible that a star customer of today may not be the same tomorrow. Matlhouse and Blattberg (2005) have given examples of customers who can be good at certain point and may not be good later and a bad customer turning to good by change of job. Past Customer Value (PCV) on the other hand calculates the total previous contribution of a customer adjusted for time value of money. Again, PCV also does not take into account the possibility of a customer being active in future (V. Kumar, 2007). Share-of-Wallet is another metric to calculate customer loyalty which takes into account the brand preference of a customer. It measures the amount that a customer will spend on a particular brand against other brands. However it is not always possible to get the details of a customer spending on other brands which makes the calculation of SOW a difficult task. A common disadvantage which these models share is the inability to look forward and hence they do not consider the prospect of a customer being active in future. The calculation of the probability of a customer being active in future is a very important part in CLV calculation, which differentiates CLV from from these traditional metrics of calculating customer loyalty. It is very important for a firm to know whether a customer will continue his relationship with it in the future or not. CLV helps firms to understand the behaviour of a customer in future and thus enable them to allocate their resources accordingly. Customer Lifetime Value is defined as the present value of all future profits obtained from a customer over his or her entire lifetime of relationship with the firm (Berger and Nassr, 1998). A very basic model to calculate CLV of a customer is (V. Kumar, 2007) : = where, is the customer index, is the time index, T is the number of time periods considered for estimating CLV, is the discount rate.

8 5 There are various models to calculate the CLV of a customer or a cohort of customers, depending on the amount of data available and the type of company. V. Kumar (2007) has shown individual level approach and aggregate level approach to calculate CLV. He has linked CLV to Customer Equity (CE) which is nothing but the average CLV of a cohort of customers. Dwyer (1997) have used a customer migration model to take into account the repeat purchase behaviour of customers. Various behaviour based models like logit-models and multivariate Probit-models have also been used (Donkers, Verhoef and Jong, 2007) and models which takes into account the relationship between various components of CLV like customer acquitition and retention are also used (Thomas 2001). We will present some of the most used models to calculate CLV in the later part of the paper. Besides this, there are various techniques that are also used to calculate CLV or the parameters needed to calculate CLV. Aeron, Kumar and Janakiraman (2010) have presented various parameters that may be useful in the calculation of CLV which include Acquisition rate, Retention rate, Add-on-selling rate, Purchase Probability, Purchase amount, Discount rate, Referral rate and Cost factor. However, all of these parameters may not be required in a single model. Various researchers have used different techniques to calculate these parameters for calculating CLV. Hansotia and Wang (1997) used Logistic Regression, Malthouse and Blattberg (2005) used linear regression for predicting future cash flows, Dries and Poel (2009) used quantile regression, Haenlein et al. (2007) used CART and markov chain model to calculate CLV. An overview of various data mining techniques used to calculate the parameters for CLV have been compiled by Aeron, Kumar and Janakiraman (2010). Besides this, many researchers also use models like Pareto/NBD, BG/NBD, MBG-NBD, CBG-NBD, Probit, Tobit, ARIMA, Support vector machines, Kohonen Networks etc., to calculate CLV. Malthouse (2009) presents a list of these methods used by academicians and researchers who participated in the Lifetime Value and Customer equity Modelling Competition. Most of the above mentioned models are used either to calculate the variables used to predict CLV or to find a relationship between them. In our research, we have used several non-linear techniques like Classification and Regression Trees (CART), Support Vector Machines (SVM), SVM using SMO, Additive Regression, K-Star Method, Multilayer Perceptron (MLP) and Wavelet Neural Network (WNN) to calculate CLV which takes care of the relationship between the variables which act as input variables in the prediction of CLV. Further we also make a comparison of these techniques to find the best fitted model for the dataset we used. Later on we make conclusions and discuss the areas of future research. 2. Literature Review: Before going into the details of various models of CLV, let us first have a look on the various approaches designed for calculating CLV. CLV can broadly be classified in 2 ways: a) Aggregate Approach b) Individual Approach 2.1 Aggregate Approach: This approach revolves around calculating Customer Equity (CE) of a firm. Customer Equity is nothing but the average CLV of a cohort of customers. Various researchers have devised different ways to calculate CE of a firm. Gupta, Lehman and Stuart

9 6 (2004) have calculated CE by summing up the CLV of all the customers and taking its average. Berger and Nassr (1998) calculated CLV from the lifetime value of a customer segment. They also took into account the rate of retention and the average acquisition cost per customer. Avg. CLV = / 1 A Here, r=rate of retention A= Avg. Acquisition cost per customer Kumar and Reinartz (2006) gave a formula for calculating the retention rate for a customer segment as follows : Retention rate(%) = N.. Projecting Retention rate : 1 Here, = predicted retention rate for a given period of time in future. And = Max attainable retention rate, given by the firm. r = coefficient of retention and calculated as r= (1/t) * (ln( ) ln( )) This model is good enough for calculating the CLV of a segment of customers over a small period of time, however the fluctuation of retention rate and gross contribution margin needs to be taken care of while projecting CLV for longer periods. Taking this into account they proposed another model which calculated the profit function over time, which can be calculated separately. This models is given as : CLV = x [ 1 ], where is the profit function over time. Blattberg, Getz and Thomas (2001) calculated average CLV or CE as the sum of return on acquisition, return on retention and return on add-on selling rate across the entire customer base. They summarized the formula as :,,,,,,,,,,,,,,,, 1 1 where,

10 7 CE(t) is the customer equity value for customers acquired at time t,, is the number of potential customers at time t for segment i,, is the acquisition probability at time t for segment i,, is the retention probability at time t for a customer in segment i,,, is the marketing cost per prospect (N) for acquiring customers for segment i,,, is the marketing costs in time period t for retained customers for segment i,,, is the marketing costs in time period t for add-on selling for segment i d is the discount rate, is the sales of the product/services offered by the firm at time t for segment i,, is the cost of goods at time t for segment i. is the number of segments, is the segment designation and is the initial time period. Rust, Lemon and Zeithaml (2004) used a CLV model in which they considered the case where a customer switches between different brands. However, in using this model, one needs to have a customer base which provides information about previous brands purchased, probability of purchasing different brands etc. Here the CLV of customer i to brand j is given as : 1 1 / where, is the number of purchases customer i makes during the specified time period, is firm js discount rate, is the average number of purchases customer i makes in a unit time (eg. Per year) is customer i s expected purchase volume of brand j in purchase t is the expected contribution margin per unit of brand j from customer i in purchase t is the probability that customer i buys brand j in purchase t.

11 8 The Customer Equity (CE) of firm j is then calculated as the mean CLV of all customers across all firms multiplied by the total number of customers in the market across all brands. 2.2 Individual Approach : In this approach, CLV is calculated for an individual customer as the sum of cumulated cash flows discounted using WACC (Weighted avg. cost of capital) of a customer over his or her entire lifetime (Kumar and George, 2007). The CLV in this case depends on the activity of the customer or his expected number of purchases during the prediction time period and also his expected contribution margin. The basic formula for CLV in this approach is : where, is the gross contribution margin for customer i in period t. This approach brings into light the need for calculating the probability of a customer being active or P(active). There are various ways to calculate P(active) : V. Kumar (2007) have calculated P(active) as : P(Active) =, where, n is the number of purchases in the observation period, T is the time elapsed between acquisition and the most recent purchase and, N is the time elapsed between acquisition and the period for which P(Active) needs to be calculated. This model however, is quite trivial. Several researchers have used statistically advanced methods to calculate P(active) or the expected frequency of purchase. Most of them have also taken into account other factors like channel communication, recency of purchase, customers characteristics, switching costs, first contribution margin etc. to make the predictions more accurate. Venkatesan and Kumar (2004) in his approach to calculate CLV predicted the customer s purchase frequency based on their past purchases. The CLV function in this case is represented as :, =,, / x,,

12 9 where, is the lifetime value of customer i,, is the contribution margin from customer i in purchase occasion y, is the discount rate,,, is the unit marketing cost for customer i in channel m in year l,,, is the number of contacts to customer i in channel m in year l, is the predicted purchase frequency for customer i, number of years to forecast, and is the predicted number of purchases made by customer i until the end of the planning period. Besides this, there have been various others models and techniques which calculate P(Active) or the expected frequency of purchase which include Pareto/NBD, BG/NBD, MBG- NBD, CBG-NBD, Probit, Tobit, generalized gamma distribution, log-normal distribution etc. Various researchers and academicians who participated in the 2008 DMEF CLV Modelling Competition have used some of these models to calculated CLV. We will come to know more about these in the next part of the paper, when we study the various models and techniques used by researchers to calculate the parameters of CLV or CLV itself. As we have seen there are various aggregate and disaggregate approaches to calculate CLV. The obvious question which one comes across is which model we use. Kumar and George (2007) have given a detailed discussion of the comparison of these models. They observed that an aggregate approach performs poorly in terms of time to implement and expected benefits and a disaggregate approach has higher data requirement and more metrics to track. They have also concluded that the model selection should depend on the requirement of the firm and which criteria would they more importance to in comparison of others. For example one may consider the cost involved as an important factor while others may consider expected profits as a major factor of contribution. Kumar and George (2007) have also proposed an integrated or hybrid approach to calculate CLV. In this approach, depending on the various details of a customer, an appropriate approach is adopted. If the firm s transaction data and firm-customer interaction data in available then individual approach of Venkatesan and Kumar (2004) is adopted. If this data is not available, but segment level data is available then Blattberg, Getz and Thomas (2001) approach is adopted, if size of wallet information of customers is not available, but survey data is not available then Rust, Lemon and Zeithaml (2004) approach is adopted.

13 Models and Techniques to calculate CLV : There are various models to calculate CLV. Most of the models calculate the parameters to measure CLV using different models and then combine the same as a new method to calculate CLV. For example Fader, Hardie and Lee (2005) captured recency and frequency in one model to calculate the expected the number of purchases and built another model to calculate the monetary value. Reinartz, Thomas and Kumar (2005) captured customer acquisition and retention simultaneously. Gupta et. al. (2006) have given a good review on modelling CLV. We will try to use some of his modelling methods in this paper with more examples and understanding RFM Models : RFM Models have been in used in direct marketing for more than 30 years. These type of models are most common in industry because of their ease of use. These type of models are based on three levels of information from customers i.e their recency, frequency and Monetary contribution. Fader, Hardie and Lee (2005) have shown that RFM variables can be used to build a CLV model and that RFM are sufficient statistics for their CLV model. We now present in brief about two RFM based models used to determine CLV. Weighted RFM Model : Mahboubeh Khajvand and Mohammad Jafar Tarokh, (2010) have presented his model for estimating customer future value based on the data given by an Iranian Bank. In this model they got the raw data from an Iranian Bank and calculated the recency, frequency and Monetary value of each customer. Using various clustering techniques like K-mean clustering, they segment the data into various groups and calculate the CLV for each cluster using the following formula: = + where, is the weight of recency, frequency and monetary value obtained by AHP method based on expert people idea. The key limit to this modelling approach is that it is scoring model rather than a CLV Model. It divides customers into various segments and then calculates a score for each segment. They don t actually provide a dollar value for each customer. Hence to overcome this, Mahboubeh Khajvand and Mohammad Jafar Tarokh, (2010) proposed a multiplicative seasonal ARIMA Auto Regressive Integrated Moving Average method to calculate CLV, which is a time series prediction method. The multiplicative seasonal ARIMA(p,d,q)x(P,D,Q)s model where, p = order of auto regressive process d = order of differencing operator q = order of moving average process

14 11 P = order of seasonal auto regressive process D= order of seasonal differencing operator Q= order of seasonal moving average process Can be represented by : D x θ B θ B ε where, is auto regressive process, is moving average process, is d-folding differencing operator which is used to change a nonstationary time series to a stationary one, is the seasonal moving average process and, D is the D-fold differencing operator The main limitation of this model was that it predicted the future value of customers in the next interval only due to lack of data. RFM and CLV using Iso-value curves : Fader, Hardie and Lee (2005) proposed this model to calculate CLV. They showed that no other information other than RFM characteristics are required to formulate this model. Further they have also used the lost for good approach to formulate this model, which means that the customers who leave the relationship with a firm never come back. It is also assumed that M is independent of R and F. This suggests that the value per transaction can be factored out and we can forecast the flow of future transactions. We can then rescale this number of discounted expected transaction (DET) by a monetary value (a multiplier) to yield a dollar number for each customer. This models is formulated as : CLV = margin x revenue/transaction x DET The calculation of DET is the most important part of this model. Fader, Hardie and Lee (2005) first of all calculated DET for a customer with observed behaviour (X=x,, T) as :

15 12 Here, the numerator is the expected number of transactions in period t and d is the discount rate. However, according to Blattberg, Getz and Thomas (2001) this calculation of CLV has the following problems : a) we don t know the time horizon in projecting the sales, b) What time periods to measure and c) The expression ignores specific timing of transactions. Hence they used Pareto/NBD model by using a continuous-time formulation instead of discrete time formulation to compute DET (and this CLV) over an infinite time horizon. The DET is thus calculated as : where r,,, are the pareto/nbd parameters. (.) is the confluent hypergeometric function of second kind; and L(.) is the pareto/nbd likelihood function. Now they added a general model of monetary value to a dollar value of CLV assuming that a customer s given transactions varies around his/her average transaction value. After that they checked various distributions to find that the gamma distribution best fitted their model and hence calculated the expected average transaction value for a customer with an avg. spend of across x transactions as : This value of monetary value obtained multiplied with DET gave the CLV of a customer. Following this, various graphs also called as iso-curves were drawn to identify customers with different purchase histories but similar CLVs, like CLV frequency, CLV Recency, CLV frequency recency etc. They key limitations of this model is that it is based on a noncontractual purchase model. It is not clear which distribution should be used to calculate the transaction incidence and transaction size immediately Computer Science and Stochastic Models : These types of models are primarily based on Data mining, machine learning, non parametric statistics and other approaches that emphasize predictive ability. These include neural network models, projection-pursuit models, decision tree models, spline-based models (Generalized Additive Models (GAM), Classification and Regression Trees (CART), Support Vector Machines (SVM) etc.). There are various researchers who have been using these techniques to calculate CLV. Haenlein et al. (2007) have used a model based on CART and 1 st order Markov chains to calculate CLV. They had the data from a retail bank. First of all they determined they various profitability drivers as predictor variables together with target variables in a CART analysis to build a regression tree. This tree helped them to cluster the customer base into a set of homogenous subgroups. They used these sub-groups as discrete states and estimate a transition matrix

16 13 which describes movements between them, using markov chains. To estimate the corresponding transition probability, they determined the state each customer belonged to at the beginning and end of a predefined time interval T by using decision rules resulting from CART analysis. In the final step the CLV of each customer group as the discounted sum of state dependent contribution margins, weighted with their corresponding transition probabilities was determined. where, = / 1 is the probability of transition from one state to other, is the contribution margin for customer i and is the discount rate. Finally, a study of the CLVs of each customer segment to carry out marketing strategies for each segment was made. This model however has some limitations too. It was assumed that assumed that client behaviour follows a 1 st order markov process, which does not take into account the behaviour of early periods, rendering it as insignificant. It was also assumed that the transition matrix is stable and constant over time, which seems inappropriate for long term forecasts and the possibilities of brand switching in customer behaviour are not taken into account. Malthouse and Blattberg (2005) have used linear regression to calculate CLV. The CLV in this case is related to the predictor variables with some regression function f as where, are independent random variable with mean 0 and error variance V( ) =, Invertible function g is a variance stabilizing transformation. We can consider various regression models for this function : a) Linear regression with variance stabilizing transformations estimated with ordinary least squares. b) Linear regression estimated with iteratively re-weighted least squares(irls). c) Feedforward neural network estimated using S-plus version Methods like k-fold cross validation are used to check the extent of correctness of the analysis. Dries and Van den Poel (2009) have used quantile regression over linear regression to calculate CLV. It extends the mean regression model to conditional quantiles of the response variables like the median. It provides insights into the effects of the covariates on the conditional CLV distribution that may be missed by the least squares method. In prediction of the top x-percent of the customers, quantile regression method is a better

17 14 method than the linear regression method. The smaller the top segment of interest, the better estimate of predictive performance we get. Besides, other data mining techniques like Decision Trees (DT), Artificial Neural Networks (ANN), Genetic Algorithm (GA), Fuzzy Logic and Support Vector Machines (SVM) are also in use but mostly to calculate CLV metrics like customer churn, acquisition rate, customer targeting etc. Among DT the most common are C4.5, CHAID, CART and SLIQ. Again ANN have also been used to catch non linear paterns in data. Besides, it can be used for both classification and regression purposes depending on the activation function. Malthouse and Blattberg (2005) used ANN to predict future cash flows. Aeron and Kumar (2010) have mentioned about different approaches for using ANN. First is the generalised stack approach used by Hu and Tsoukalas (2003) where ensemble method is used. The data is first divided into three groups. The first group has all situational variables, the second has all demographics variables and the third has both situational and demographic variables. The other is the hybrid approach of GA/ANN by Kim and Street (2004) for customer targeting where, GA searches the exponential space of features and passes one subset of features to ANN. The ANN extracts predictive information from each subset and learns the patters. Once, it finds the data patters, it is evaluated on a data set and returns metrics to GA. ANN too is not without limitations. It cannot handle too many variables. So, various other algorithms like GA, PCA (Principal Component Analysis) and logistic regression are used for selecting variables to input in ANN. There is no set rule to find ANN parameters. Selection of these parameters is a research area in itself. Besides all this initial weights are decided randomly in ANN, which takes longer time to reach the desired solution. Genetic Algorihm (GA) are more suitable for optimization problems as they achieve global optimum with quick convergence especially for high dimensional problems. GA have seen varied applications among CLV parameters like multiobjective optimization (using Genetic- Pareto Algorithm), churn prediction, customer targeting, cross selling and feature selection. GA is either used to predict these parameters or optimize parameter selection of other techniques like ANN. Besides GA, Fuzzy Logic and Support Vector Machines also find applications for predicting churn and loyality index. There are many other techniques and models like GAM(Generalized Addictive Models), MARS(Multivariate Adaptive Regression Splines), Support Vector Machines (SVM) etc. which are used to predict or optimize the various parameters for CLV like churn rate, logit, hazard functions, classification etc. Churn Rate in itself is a very vast area of CRM which can be used as a parameter in the prediction of CLV and many other related models. There have been many worldwide competitions and tournaments in which various academics and practitioners use various methods by combining different models to get the best possible results. These approaches remain little known in the marketing literature and has a lot of scope for further research. The 2008 DMEF CLV Competition was one such competition in which various researchers and academicians came together to compete for the three tasks in that competition. Malthouse

18 15 (2009) have made a compilation of the various models which were presented in that competition Growth/Diffusion Models : These types of models focus on calculating the CLV of current and future customers. Forecasting acquisition of future customers can be done in 2 ways : The first approach uses disaggregate customer data and builds models that predict the probability of acquiring a particular customer (Thomas, Blattberg and Fox, 2004). The other approach is to use aggregate data and use diffusion or growth to predict the no. of customers a firm is likely to acquire in the future (Gupta, Lehman and Stuart, 2004). The expression for forecasting the number of new customers at time t is : Using this, they estimated the CE of a firm as :, where,, are parameters of the customer growth curve where, is the the no. of newly acquired customers for a segment k, m is the margin, r is retention rate, i is the discount rate, and c is acquisition cost per customer. Diffusion models can also be used to assess the value of a lost customer. For eg. In a banking Industry which has recently acquired a new technology will have some customers who will be reluctant to that change and will be lost. If the relative proportions of lost customers are, then value of average lost customer is : Econometric Models : Gupta et. al (2006) have given a good review on this type of models. We will present the same in brief in this paper with an example of a right censored tobit model by Hansotia and Wang (1997). Econometric models study customer acquisition, retention and expansion (cross selling or margin) and combine them to calculate CLV. Customer Acquisition and Customer Retention are the key inputs for such a type of model. Various models relate customer acquisition and retention and come up with new models to calculate CLV. For example the right censored Tobit Model for CLV (Hansotia and Wang,

19 ). It has also been shown by some researchers (Thomas, 2001) that ignoring the link b/w customer acquisition and retention may cause a 6-50% variation from these models. For example if we spend less money on acquisition, the customers might walk away soon. In case of retention models, they are broady classified into two main categories : a) the first one considers the lost for good approach and uses hazard models to predict the probability of customer deflection, b) the second one considers the always a share approach and typically uses markov models. Hazard models are used to predict probability of customer deflection. They again are are of two types : a) Accelerated Failure time (AFT) (Kalbfleisch and Prentice, 1980) and b) Proportional Hazard Models (PH) (Levinthal and Fichman, 1988). AFT is of the form : ln( ) =, where, where t is purchase duration for customer j and X are covariates. Different specifications of and lead to different models such as Weibull or generalized gamma Model. Again PH models specify the hazard rate ( ) and covariates (X) as : ; exp. We get different models like exponential, weibull, gompertz etc. for different specifications. Hansotia and Wang (1997) used a right censored tobit model to calculate the lifetime value of customers or LTV as it was called then. It is a regression model with right censored observations and can be estimated by the method of maximum likelihood. The present value of a customer s revenue (PVR) for the qth customer receiving package j was calculated as : where,, is the (K+1) dimensional column vector of profile variable for the qth customer. The equation may also be estimated using LIFEBERG procedure in SAS. The likelihood function which is the probability of observing the sample value was given by : where, S=1 if observation i is uncensored and 0 otherwise. Besides, the four type of models presented in this paper, Gupta et. al (2006) have also mentioned about a probability model. However, in our research, it has been taken into account in the Computer science and stochastic model. However Gupta et. al. (2006) have

20 17 made a few assumptions in their review of probability models like the probability of a customer being alive can be characterized by various probability distributions models. They have also taken into account the heterogeneity in dropout rates across customers. Various combinations of these assumptions results in models like pareto/nbd, betabinomial/beta-geometric (BG/BB), markov models etc. Other than that Gupta et. al. (2006) have also mentioned about persistence models which has been used in some CLV context to study the impact of advertising, discounting and product quality on customer equity (Yoo and Hanssens, 2005) and to examine differences in CLV resulting from different customer acquisition methods (Villanueva, Yoo, and Hanssens, 2006) Some other Modelling Approaches : Donkers et al. (2007) have also made a review of various CLV modelling approaches with respect to the insurance industry sector. These include a status quo model, a Tobit-II model, univariate and multivariate models and duration models. They grouped these models into two types of models. First Relationship Level models which focus on relationship length and total profit, and is build directly on the definition of CLV as defined by Berger and Nasr (1998) : where, d is a predefined discount rate and Profit, for a multiservice industry is defined as : where, J is the number of different services sold, Serv, is a dummy indicating whether customer i purchases service j at time t, Usage, is the amount of service purchased, and Margin, is the average profit margin for service j., and the second is the service level models- which disaggregate a customer s profit into the contribution per service. The CLV predictions are then obtained by predicting purchase behaviour at the service level and combining the results of both models to calculate CLV. An overview of the models as presented by Donkers et al. (2007) with their mathematical models is given below :

21 18 An overview of Relationship Level Models : Here the Status Quo Model assumes profit simply remains constant over time. Profit Regression Model aims at prediction of customer s annual profit contribution. Retention Models are based on segmenting over RFM. Probit Model is based on customer specific retention probabilities. Bagging Model is also based on customer specific retention probability. Duration Model is focused on customer s relationship duration. Tobit II Model separates the effect of customer deflection on profitability. An Overview of Sevice-level-Models : These types of models are explained as choice model approach and duration model approach. Choice model approach has as dependent variable the decision to purchase a service or not. Duration Model approach focuses on the duration of an existing relationship. It only models the ending of a period and not the starting of a new one. The next part of the paper presents the machine learning approach, we have used to calculate the future value of customers. A dataset obtained from Microsoft Access 2000, the Northwind Traders is adopted to demonstrate our approach. We have used Classification and Regression Trees (CART), Support Vector Machines (SVM), SVM using SMO, Additive

22 19 Regression, K-Star Method, Multilayer Perceptron (MLP) and Wavelet Neural Network (WNN) to calculate the futute value of customers. In the later part of the paper, we make a comparison of these models and suggest the best model to calculate the CLV. We end this paper with results and discussion on the future development in the area of CLV measurement. 3. Estimating Future Customer Value using Machine Learning Techniques: There are various data mining techniques which are used in the field of classification and regression. The use of a technique depends on the type of data available. In our case, we have we have used the regression technique to determine the future value of customers in the next prediction period. In the past, several researchers have used these techniques to determine the metrics of CLV depending on the type of model and approach they have used. Hansotia and Wang (1997) have used CART and CHAID for customer acquisition. Kim and Street (2004) have used ANN for customer targeting, Au et al. (2003) used Genetic Algorithms (GA) for predicting customer s churn. However, using these techniques to directly predict a customer s future value and hence CLV have not been done so far. Most of the previous approaches in measuring CLV have used two or more models to calculate either CLV or determine the relationship between the various parameters used to determine CLV. The approach which we have adopted tries to eliminate this process and allows the software which uses this technique to predict the relationship between the input variables and their weightage in calculating CLV. 3.1 Data Description : A sample database of Microsoft Access 2000, the Northwind Traders database is adopted to calculate the CLV of customers. The database contains 89 customers with a purchase period of 2 years from 1 st July 1994 till 30 th June We have divided this time frame into 4 equal half years and calculated the frequency of purchase and the total monetary contribution in July December 1994, January June 1995, July December 1995 and January June Further we kept the observation period from July, 1994 till December 1995 and made a prediction of the expected contribution in the next period i.e. January June The total variables used are 7, out of which 6 are input or predictor variables and the remaining one i.e. contribution margin in jan-june, 1996 as the target variable. The entire dataset is then dived in two parts: a) training and b) testing. We used 65 samples for training the data and the remaining 24 for testing purposes.

23 20 Table 1: Description of variables Type of variable Variable Name Variable Description Input Variable Recency-dec95 Calculates the recency as a score, calculating july, 94 as 1 and dec, 95 as 18 Input Variable total frequency The total number of purchases between july, 94 till dec, 95 Input Variable Total duration The total duration of observation i.e from july 94 till dec, 95 Input Variable CM_july-dec94 The contribution margin in the period july dec, 94 Input Variable CM_jan-june95 The contribution margin in the period jan june, 95 Input Variable CM_july-dec95 The contribution margin in the period july dec, 95 Target Variable output The contribution margin in the period jan june, Models and Software used: Knime 2.0.0, Salford Predictive Miner (SPM), NeuroShell 2 (Release 4.0) and a software by Chauhan et al. (2009) developed at IDRBT for classification problems in DEWNN, Hyderabad is used for analysis. In Knime, we have used Support Vector Machines (SVM), SVM using SMO, Additive Regression, K-Star Method, for learning purposes of the training dataset and the weka predictor for prediction of the testing dataset. In Salford Predictive Miner (SPM), we used CART to train the dataset and applied the rules obtained from the training dataset on the testing dataset for prediction. The software developed at IDRBT, Hyderabad was used to train the data using Wavelet Neural Network (WNN) and applied the learning parameters on the test data to get the results and NeuroShell for MLP. We have given brief description of the techniques used for prediction of the target variable SVM : The SVM is a powerful learning algorithm based on recent advances in statistical learning theory (Vapnik, 1998). SVMs are learning systems that use a hypothesis space of linear functions in a high-dimensional space, trained with a learning algorithm from optimization theory that implements a learning bias derived from statistical learning theory (Cristianini & Shawe-Taylor, 2000). SVMs have recently become one of the popular tools for machine learning and data mining and can perform both classification and regression. SVM uses a linear model to implement non-linear class boundaries by mapping input vectors non-linearly into a high dimensional feature space using kernels. The training examples that

24 21 are closest to the maximum margin hyper plane are called support vectors. All other training examples are irrelevant for defining the binary class boundaries. The support vectors are then used to construct an optimal linear separating hyper plane (in case of pattern recognition) or a linear regression function (in case of regression) in this feature space. The support vectors are conventionally determined by solving a quadratic programming (QP) problem. SVMs have the following advantages: (i) they are able to generalize well even if trained with a small number of examples and (ii) they do not assume prior knowledge of the probability distribution of the underlying dataset. SVM is simple enough to be analyzed mathematically. In fact, SVM may serve as a sound alternative combining the advantages of conventional statistical methods that are more theory-driven and easy to analyze and machine learning methods that are more data-driven, distribution-free and robust. Recently, SVM are used in financial applications such as credit rating, time series prediction and insurance claim fraud detection (Vinaykumar et al., 2008). In our research, we used two SVM learner models for predictive purposes. First we used the SVM Regression model as the learner function and then used weka predictor to get the results. We found the correlation coefficient as and root relative squared squared error as 48.03%. In case of SVO (sequential minimal optimization algorithm) for training a support vector regression model, we replaced the learner function by the SVOreg function. This implementation globally replaces all missing values and transforms nominal attributes into binary ones. It also normalizes all attributes by default. Here we found the correlation coefficient as and the root relative squared error as 47.98% Additive Regression and K-star: Addtive Regression is another classifier used in weka that enhances the performance of a regression base classifier. Each iteration fits a model to the residuals left by the classifier on the previous iteration. Prediction is accomplished by adding the predictions of each classifier. Reducing the shrinkage (learning rate) parameter helps prevent overfitting and has a smoothing effect but increases the learning time. K-star on the other hand is an instance-based classifier, that is the class of a test instance is based upon the class of those training instances similar to it, as determined by some similarity function. It differs from other instance-based learners in that it uses an entropy-based distance function. These techniques are quite similar to what we did in SVM Regression and SMO Regression learners using weka predictors. In Additive Regression, we found the correlation coefficient as 0.895, the root mean squared error as and the root relative squared error as 44.36%. In case of K-star, we found the correlation coefficient as , root mean squared error as and the root relative squared error as 46.41%.

25 MLP : Multilayer Perceptron (MLP) is one of the most common neural network structures, as they are simple and effective, and have found home in a wide assortment of machine learning applications. MLPs start as a network of nodes arranged in three layers the input, hidden, and output layers. The input and output layers serve as nodes to buffer input and output for the model, respectively, and the hidden layer serves to provide a means for input relations to be represented in the output. Before any data is passed to the network, the weights for the nodes are random, which has the effect of making the network much like a newborn s brain developed but without knowledge. MLPs are feed-forward neural networks trained with the standard back propagation algorithm. They are supervised networks so they require a desired response to be trained. They learn how to transform input data into a desired response So they are widely used for pattern classification and prediction. A multi-layer perceptron is made up of several layers of neurons. Each layer is fully connected to the next one. With one or two hidden layers, they can approximate virtually any input output map. They have been shown to yield accurate predictions in difficult problems (Rumelhart, Hinton, & Williams, 1986, chap. 8). In our research, we used NeuroShell 2 (version 4.0) to determine the results. For learning purposes we set the learning rate as 0.5, momentum rate as 0.1 and the scale function as linear [-1,1] to get the best results. We found the root mean squared error as 43.8 % which was the least among all other methods used, as we will find out later WNN : The word wavelet is due to Grossmann et al. (1984). Wavelets are a class of function used to localize a given function in both space and scaling ( They have advantages over traditional Fourier methods in analyzing physical situations where the signal contains discontinuities and sharp spikes. Wavelets were developed independently in the fields of mathematics, quantum physics, electrical engineering and seismic geology. Interchanges between these fields during the last few years have led to many new wavelet applications such as image compression, radar and earthquake prediction. A family of wavelet can be constructed from a function ψ ( x) known as mother wavelet, a, which is confined in a finite interval Daughter Wavelets ψ b ( x) are then formed by translation (b) and dilation (a). Wavelets are especially useful for compressing image data. An individual wavelet is defined by a, b 1/2 x b ψ ( x) = α Ψ( ) a In the case of non-uniformly distributed training data, an efficient way of solving this problem is by learning at multiple resolutions. Wavelets in addition to forming an orthogonal basis are capable of explicitly representing the behaviour of a function at various

26 23 resolutions of input variables. Consequently, a wavelet network is first trained to learn the mapping at the coarsest resolution level. In subsequent stages, the network is trained to incorporate elements of mapping at higher and higher resolutions. Such hierarchical, multi resolution has many attractive features for solving engineering problems, resulting in a more meaningful interpretation of the resulting mapping and more efficient training and adaptation of the network compared to conventional methods. The wavelet theory provides useful guidelines for the construction and initialization of networks and consequently, the training times are significantly reduced ( Wavelet networks employ activation functions that are dilated and translated versions of a single function, where d is the input dimension (Zhang, 1997). This function called the mother wavelet is localized both in the space and frequency domains (Becerra, Galvao, Abou-Seads 2005). The wavelet neural network (WNN) was proposed as a universal tool for functional approximation, which shows surprising effectiveness in solving the conventional problem of poor convergence or even divergence encountered in other kinds of neural networks It can dramatically increase convergence speed (Zhang et al., 2001). The WNN network is consists of three layers namely input layer, hidden layer and output layer. Each layer is fully connected to the nodes in the next subsequent layer. Number of input and output nodes depends on the number of inputs and outputs present in the problem. The number of hidden node can be any number from 3 to 15is a user-defined parameter depending on the problem. WNN is implemented here with the Gaussian wavelet function. The original training algorithm for training a WNN is as follows (Zhang et al., 2001): 1) Specify the number of hidden nodes required. Initialize randomly the dilation and translation parameters and the weights for the connections between the input and hidden layers and also between the hidden and the output layers. 2) The output value of the sample, k = 1,2,..,np, is calculated with the following formula :computed as follows: K n h n V = W f j = 1 j n i n w x b i j k i j i = 1 ( ) a j (1) where, nin is the number of input nodes and nhn is the number of hidden nodes and np is the number of samples. In (1) when f(t) is taken as Morlet mother wavelet is has the following form : f t t t 2 ( ) = cos(1.75 )exp( / 2) (2)

27 24 And when taken as Gaussian wavelet it becomes f t 2 ( ) = exp( t ) (3) 3) Reduce the error of prediction by adjusting updating using Wj, wij, a j, b j using W, w, a, b (see formulas (4)-(7)). Thus, in training the WNN, the gradient descend j ij j j algorithm is employed: E Wj ( t + 1) = η + α W j ( t), W E wij ( t + 1) = η + α wij ( t), w ( t) E a j ( t + 1) = η + α a j ( t), a ( t) E bj ( t + 1) = η + α bj ( t), b ( t) ij j j j (4) (5) (6) (7) where, the error function can be taken as 1/2 2 k= np ( VK V ) K E = 2 k = 1 V, (8) K Where ηand α are the learning and the momentum rates respectively. 4) Return to step (2) the process is continued until E satisfies the given error criteria, and the whole training of the WNN is completed. Some problem exists in the original WNN such as slow convergence, entrapment in local minima and oscillation (Pan et al., 2008). We propose BFTWNN to resolve these problems. In our research, we used a software made by Chauhan et al. (2009) for DEWNN (Differential evolution trained Wavelet Neural Network). The software was initially made for classification purposes. We changed the software code from classification to regression type and used it in our problem. We set the weight factor as 0.95, convergence criteria as ,crossover factor as 0.95, population size as 60, number of hidden node as 20, maximum weight as 102 and minimum weight as -102 to find the optimum solution. We found the test set normalized root mean square error as The root relative squared error as %, which was the highest amongst all the results CART : Decision trees form an integral part of machine learning an important subdiscipline of artificial intelligence. Almost all the decision tree algorithms are used for solving

28 25 classification problems. However, algorithms like CART solve regression problems also. Decision tree algorithms induce a binary tree on a given training data, resulting in a set of if then rules. These rules can be used to solve the classification or regression problem. CART ( is a robust, easy-to-use decision tree tool that automatically sifts large, complex databases, searching for and isolating significant patterns and relationships. CART uses a recursive partitioning, a combination of exhaustive searches and intensive testing techniques to identify useful tree structures in the data. This discovered knowledge is then used to generate a decision tree resulting in reliable, easy-tograsp predictive models in the form of if then rules. CART is powerful because it can deal with incomplete data; multiple types of features (floats, enumerated sets) both in input features and predicted features, and the trees it produces contain rules, which are humanly readable. Decision trees contain a binary question (with yes/no answer) about some feature at each node in the tree. The leaves of the tree contain the best prediction based on the training data. Decision lists are a reduced form of this where an answer to each question leads directly to a leaf node. A tree s leaf node may be a single member of some class, a probability density function (over some discrete class) or a predicted mean value for a continuous feature or a Gaussian (mean and standard deviation for a continuous value). The key elements of a CART analysis are a set of rules for: (i) splitting each node in a tree, (ii) deciding when a tree is complete; and (iii) assigning each terminal node to a class outcome (or predicted value for regression). In our research, we used Salford Predictive Miner (SPM) to use CART for prediction purposes. We trained the model using least absolute deviation on the training data. We found that the root mean squared error was and the total number of nodes was 5, however, on growing the tree nodes from 5 to 6, we found better results. The root mean squared error changed to and the root relative squared error is 45.38% which is very close to MLP. Figure 3 shows the plot of relative vs. The number of nodes. We see that we got the optimum results on growing the tree from node 5 to node 6. Figure 1 : CART : Plot of relative error vs number of nodes Figure 2: CART : Plot of percent error vs. Terminal nodes

29 26 It was also seen from the results that, when the optimum number of nodes were kept at 5, 19 out 24 customers were put in node 1, 4 in node 3 and 1 in node 6. We also found that the root mean squared error was for the 19 customers in node 1, which is better than the overall error. However, the overall increase in error was caused due to misclassification or high error rate in splitting customers in node 4 and node 6. In case of growing, optimum nodes to 6, we found that 14 customers were split in node 1, 5 in node 2, 4 in node 4 and 1 in node 6. The RMSE in node 1 was , which was way less than the total RMSE of One obvious conclusion, one can draw from CART is that it is more useful than other methods for prediction, because of its rules which gives companies the flexibility to decide which customer to put in which node and also to choose the optimum number of nodes for their analysis. Figure 3 : CART : Tree details showing the splitting rules at each node A summary of the rules is given as : 1. if(cm_july_dec95 <= && CM_JAN_JUNE95 <= ) then y = if(cm_july_dec95 <= &&CM_JAN_JUNE95 > && CM_JAN_JUNE95 <= ) then y = if(cm_jan_june95 <= && CM_JULY_DEC95 > && CM_JULY_DEC95 <= ) then y = if(cm_jan_june95 <= && CM_JULY_DEC95 > && TOTAL_FREQUENCY <= 14 ) then y = if(cm_jan_june95 <= && CM_JULY_DEC95 > && TOTAL_FREQUENCY > 14 ) then y = if(cm_jan_june95 > ) then y = ; where, y is median

30 27 4. Results and Comparison of Models : We have used various machine learning techniques to calculate the future value of 24 customers from a sample of 89 customers. We have used various techniques like SVM, WNN, Additive Regression, K-star Method in Knime using weka predictor, CART in SPM and MLP in NeuroShell. We found that MLP has given the least error amongst all these models, but we find CART to be more useful, as is more helpful in taking decisions by setting splitting rules and also predicts more accurately for a greater section of the test sample by splitting the sample into various nodes. We find that companies can make better decisions with the help of these rules and the segmentation technique in CART. A detailed summary of the final results of competing models is given in Table 2. One limitation of our study is that we have only predicted the future value of only the next time period. Besides this, the error percentage is relative high, because of the small amount of dataset we have. We believe that these models will be able to perform better in case of large dataset with more input variables including customer demographics, customer behaviour etc. Table 2 : Comparison of Competing Models Correlation coefficient Root Mean Squared error Mean Absolute error Root relative squared error SVMreg % SMOreg % Additive Reg % K-star % MLP NA % CART NA % 49 Figure 4 : Graph of Error vs Model MLP Additive Reg. CART K-Star SMOreg SVMreg

31 28 5. Conclusion and Directions of future research: In this paper we have presented a review of various approaches and modelling techniques to determine Customer Lifetime Value. We have also covered the tradional techniques used to calculated Customer Loyalty and found that CLV is better metric compared to these measures. The most common approaches used to measure CLV are aggregate approach and individual approach. We also see that the type of approach used to calculate CLV depends on the type of data available and the type of result which a firm wants. Further, we have also reviewed various modelling techniques to determine CLV, which include RFM Models, Computer Science and Stochastic Models, Econometric Models, Diffusion Models and also relationship level models and service level models. We see that the most frequently applied techniques to determine CLV parameter or to determine the relationship between them include, Pareto/NBD models, Decision trees, Artificial Neural Networks, Genetic Algorithms, Support Vector Machines. We have also presented a study of measuring CLV by means of various machine learning techniques. Emphasis has been given to catch the non-linear pattern in the data which was available for a set of 89 customers having a 2 year transaction history. We have used Classification and Regression Trees (CART), Support Vector Machines (SVM), SVM using SMO, Additive Regression, K-Star Method, Multilayer Perceptron (MLP) and Wavelet Neural Network (WNN) for the calculation of the future value 24 customers. Further we see that although MLP gives the best result amongst all these models, we would still recommend using CART to calculate CLV as it segments the customers into various nodes and calculates more precisely for a larger segment of test case customers. Besides, the splitting rules would also help any firm to understand better the classification of a customer into a particular segment and hence derive more profit out of him. The main limitations of our study have been the projection of future value of customers till only the next period, mainly due to the limitation of the dataset we had. This also resulted in some high error rates even amongst the best models. There limitations can be overcome by using datasets which can give more information about the customer behaviour, his demographics etc. Besides, a large dataset will be useful to make better predictions as it can estimate the training parameters better. For better estimation in small datasets, we have not covered techniques like k-fold cross validation, which again, can be taken as an area of future research. We have also not given much emphasis on feature selection and the relationship between the input variables to calculate CLV. Producing better results with an integrated approach with this dataset is again an area of future research.

32 29 References: Aeron, H., Kumar, A. and Janakiraman, M. (2010) Application of data mining techniques for customer lifetime value parameters : a review, Int. J. Business Information Systems, Vol. 6, No. 4, pp Au, W., Chan, K., & Yao, X. (2003). A novel evolutionary data mining algorithm with applications to churn prediction. IEEE Transactions on Evolutionary Computation, 7(6), Becerra, V. M., Galvao, H., & Abou-Seads, M. (2005). Neural and wavelet network models for financial distress classification. Data Mining and Knowledge Discovery, 11, doi: /s Berger, P. D. and Nasr, N. I. (1998), Customer lifetime value: Marketing models and applications. Journal of Interactive Marketing, 12: Blattberg, Robert C., Getz G., Thomas js (2001), ''Customer Equity: Building and Managing Relationships as Valuable Assets'', Boston, MA : Harvard Business School Press. Chauhan, N., V. Ravi, D. Karthik Chandra: Differential evolution trained wavelet neural networks: Application to bankruptcy prediction in banks.expert Syst. Appl. 36(4): (2009) Cristianini, N. and J. Shawe-Taylor (2000). An Introduction to Support Vector Machines. Cambridge, UK: Cambridge University Press. Donkers, B. P.C. Verhoef and M.G. de Jong (2007) Modeling CLV: a Test of Competing Models in the Insurance Industry, Quantitative Marketing and Economics, 5 (2) Dries F. Benoit, Dirk Van den Poel: Benefits of quantile regression for the analysis of customer lifetime value in a contractual setting: An application in financial services. Expert Syst. Appl. 36 (7): (2009) Dwyer, R.F (1997) Customer lifetime valuation to support marketing decision making, Journal of Direct Marketing, Vol. 11, No. 4, pp Fader, Peter S., Bruce G. S. Hardie, and Ka Lok Lee (2005), RFM and CLV: Using Iso-CLV Curves for customer base analysis, Jounal of Marketing Research, 42 (November), Gupta, Sunil, Donald R. Lehmann and Jennifer Ames Stuart (2004), Valuing Customers, Journal of Marketing Research, 41 (1), 7-18., Hanssens, D., Hardie, B., Kahn, W., Kumar, V., and Lin, N. Modelling Customer Lifetime Value. Journal of Service Research, 9, 2006,

33 30 Hansotia, B. J. and P. Wang (1997), Analytical challenges in customer acquisition. Journal of Direct Marketing 11(2), Haenlein, M., Kaplan, A.M., Beeser, A.J. (2007) A model to determine customer lifetime value in a retail banking context, European management journal. Hu, M. and Tsoukalas, C., Explaining Consumer Choice through Neural Networks: The Stacked Generalization Approach, European Journal of Operational Research, Vol. 146, No. 3, 2003, Kalbfleisch, J. D. and R. L. Prentice. (1980), Statistical Analysis of Failure Time Data, New York: Wiley Kim, Y., Street, N. (2004). An intelligent recommendation system for customer targeting: A data mining approach. Decision Support Systems, 37(2), Kumar, V. and J. Werner Reinartz (2006), Customer Relationship Management : A Databased Approach. New York : John Willey. and Morris George (2007), Journal of the Academy of Marketing Science, 35: Levinthal, D. and M. Fichman. (1988). Dynamics of Interorganizational Attachments: Auditor Client Relationships. Administrative Science Quarterly, 33, Malthouse, Edward C The Results from the Lifetime Value and Customer Equity Modeling Competition. Journal of Interactive Marketing, Vol. 23 (2009), pp Malthouse, C.E and Blattberg, C.R. (2005) Can we predict customer lifetime value, Journal of Interactive Marketing, Vol. 19, No. 1, pp Mahboubeh Khajvand, and Mohammad Jafar Tarokh. Estimating customer future value of different customer segments based on adapted RFM model in retail banking context.. Procedia CS, (3): , Reinartz, Werner, Jacquelyn Thomas and V. Kumar (2005), Balancing Acquisition and Rentension Resources to Maximize Customer Profitability, Journal of Marketing, 69 (1), Rumelhart, David E.; Hinton, Geoffrey E., Williams, Ronald J. (8 October 1986). "Learning representations by back-propagating errors". Nature 323 (6088): Rust, R. T., K. N. Lemon, and V. A. Zeithaml (2004), Return on marketing: Using customer equity to focus marketing strategy. Journal of Marketing 68, Thomas, Jacquelyn (2001), A methodology for linking customers acquisition to customer retention, Journal of Marketing Research, 38 (2),

34 31 Thomas, J.S., Blattberg R.C., and Fox, E.J. (2004, February), Recapturing lost customers, Journal of Marketing Research, 41, V. Kumar, Customer Lifetime Value The path to profitability, Foundations and Trends in Marketing, vol 2, no 1, pp 1-96, Vapnik, V., Statistical Learning Theory Wiley. New York. Venkatesan, R. and V. Kumar (2004), 'A customer lifetime value framework for customer selections and resource allocation strategy'. Journal of Marketing, 68, (October). Villanueva J., S. Yoo, and D.M. Hanssens, "The Impact of Marketing-Induced vs. Word-of- Mouth Customer Acquisition on Customer Equity," Journal of Marketing Research, February Vinay Kumar, K., V. Ravi, Mahil Carr, N. Raj Kiran: Software development cost estimation using wavelet neural networks. Journal of Systems and Software 81(11): (2008) Yoo S. & D.M. Hanssens, "Modeling the Sales and Customer Equity Effects of the Marketing Mix," revised, February 2005, working paper, University of California, Los Angeles, Anderson School of Management. Zhang. Q, Using wavelet network in non-parameters estimation. IEEE Transaction Neural Networks 8 (2): 227~236

A Basic Guide to Modeling Techniques for All Direct Marketing Challenges

A Basic Guide to Modeling Techniques for All Direct Marketing Challenges A Basic Guide to Modeling Techniques for All Direct Marketing Challenges Allison Cornia Database Marketing Manager Microsoft Corporation C. Olivia Rud Executive Vice President Data Square, LLC Overview

More information

AUTOMATION OF ENERGY DEMAND FORECASTING. Sanzad Siddique, B.S.

AUTOMATION OF ENERGY DEMAND FORECASTING. Sanzad Siddique, B.S. AUTOMATION OF ENERGY DEMAND FORECASTING by Sanzad Siddique, B.S. A Thesis submitted to the Faculty of the Graduate School, Marquette University, in Partial Fulfillment of the Requirements for the Degree

More information

Predictive Dynamix Inc

Predictive Dynamix Inc Predictive Modeling Technology Predictive modeling is concerned with analyzing patterns and trends in historical and operational data in order to transform data into actionable decisions. This is accomplished

More information

Chapter 5: Customer Analytics Part I

Chapter 5: Customer Analytics Part I Chapter 5: Customer Analytics Part I Overview Topics discussed: Traditional Marketing Metrics Customer Acquisition Metrics Customer Activity Metrics Popular Customer-based Value Metrics 2 Traditional and

More information

Predict the Popularity of YouTube Videos Using Early View Data

Predict the Popularity of YouTube Videos Using Early View Data 000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050

More information

Chapter 12 Discovering New Knowledge Data Mining

Chapter 12 Discovering New Knowledge Data Mining Chapter 12 Discovering New Knowledge Data Mining Becerra-Fernandez, et al. -- Knowledge Management 1/e -- 2004 Prentice Hall Additional material 2007 Dekai Wu Chapter Objectives Introduce the student to

More information

A Neural Network based Approach for Predicting Customer Churn in Cellular Network Services

A Neural Network based Approach for Predicting Customer Churn in Cellular Network Services A Neural Network based Approach for Predicting Customer Churn in Cellular Network Services Anuj Sharma Information Systems Area Indian Institute of Management, Indore, India Dr. Prabin Kumar Panigrahi

More information

A Property & Casualty Insurance Predictive Modeling Process in SAS

A Property & Casualty Insurance Predictive Modeling Process in SAS Paper AA-02-2015 A Property & Casualty Insurance Predictive Modeling Process in SAS 1.0 ABSTRACT Mei Najim, Sedgwick Claim Management Services, Chicago, Illinois Predictive analytics has been developing

More information

Making Sense of the Mayhem: Machine Learning and March Madness

Making Sense of the Mayhem: Machine Learning and March Madness Making Sense of the Mayhem: Machine Learning and March Madness Alex Tran and Adam Ginzberg Stanford University [email protected] [email protected] I. Introduction III. Model The goal of our research

More information

Modeling Customer Lifetime Value

Modeling Customer Lifetime Value Modeling Customer Lifetime Value Sunil Gupta Harvard University Dominique Hanssens University of California, Los Angeles; Marketing Science Institute Bruce Hardie London Business School Wiliam Kahn Capital

More information

A Property and Casualty Insurance Predictive Modeling Process in SAS

A Property and Casualty Insurance Predictive Modeling Process in SAS Paper 11422-2016 A Property and Casualty Insurance Predictive Modeling Process in SAS Mei Najim, Sedgwick Claim Management Services ABSTRACT Predictive analytics is an area that has been developing rapidly

More information

Statistics Graduate Courses

Statistics Graduate Courses Statistics Graduate Courses STAT 7002--Topics in Statistics-Biological/Physical/Mathematics (cr.arr.).organized study of selected topics. Subjects and earnable credit may vary from semester to semester.

More information

Neural Networks and Support Vector Machines

Neural Networks and Support Vector Machines INF5390 - Kunstig intelligens Neural Networks and Support Vector Machines Roar Fjellheim INF5390-13 Neural Networks and SVM 1 Outline Neural networks Perceptrons Neural networks Support vector machines

More information

NTC Project: S01-PH10 (formerly I01-P10) 1 Forecasting Women s Apparel Sales Using Mathematical Modeling

NTC Project: S01-PH10 (formerly I01-P10) 1 Forecasting Women s Apparel Sales Using Mathematical Modeling 1 Forecasting Women s Apparel Sales Using Mathematical Modeling Celia Frank* 1, Balaji Vemulapalli 1, Les M. Sztandera 2, Amar Raheja 3 1 School of Textiles and Materials Technology 2 Computer Information

More information

Data Mining Applications in Higher Education

Data Mining Applications in Higher Education Executive report Data Mining Applications in Higher Education Jing Luan, PhD Chief Planning and Research Officer, Cabrillo College Founder, Knowledge Discovery Laboratories Table of contents Introduction..............................................................2

More information

Marketing Mix Modelling and Big Data P. M Cain

Marketing Mix Modelling and Big Data P. M Cain 1) Introduction Marketing Mix Modelling and Big Data P. M Cain Big data is generally defined in terms of the volume and variety of structured and unstructured information. Whereas structured data is stored

More information

Chapter 6. The stacking ensemble approach

Chapter 6. The stacking ensemble approach 82 This chapter proposes the stacking ensemble approach for combining different data mining classifiers to get better performance. Other combination techniques like voting, bagging etc are also described

More information

Applied Data Mining Analysis: A Step-by-Step Introduction Using Real-World Data Sets

Applied Data Mining Analysis: A Step-by-Step Introduction Using Real-World Data Sets Applied Data Mining Analysis: A Step-by-Step Introduction Using Real-World Data Sets http://info.salford-systems.com/jsm-2015-ctw August 2015 Salford Systems Course Outline Demonstration of two classification

More information

Classification of Bad Accounts in Credit Card Industry

Classification of Bad Accounts in Credit Card Industry Classification of Bad Accounts in Credit Card Industry Chengwei Yuan December 12, 2014 Introduction Risk management is critical for a credit card company to survive in such competing industry. In addition

More information

KNIME TUTORIAL. Anna Monreale KDD-Lab, University of Pisa Email: [email protected]

KNIME TUTORIAL. Anna Monreale KDD-Lab, University of Pisa Email: annam@di.unipi.it KNIME TUTORIAL Anna Monreale KDD-Lab, University of Pisa Email: [email protected] Outline Introduction on KNIME KNIME components Exercise: Market Basket Analysis Exercise: Customer Segmentation Exercise:

More information

MARKET SEGMENTATION, CUSTOMER LIFETIME VALUE, AND CUSTOMER ATTRITION IN HEALTH INSURANCE: A SINGLE ENDEAVOR THROUGH DATA MINING

MARKET SEGMENTATION, CUSTOMER LIFETIME VALUE, AND CUSTOMER ATTRITION IN HEALTH INSURANCE: A SINGLE ENDEAVOR THROUGH DATA MINING MARKET SEGMENTATION, CUSTOMER LIFETIME VALUE, AND CUSTOMER ATTRITION IN HEALTH INSURANCE: A SINGLE ENDEAVOR THROUGH DATA MINING Illya Mowerman WellPoint, Inc. 370 Bassett Road North Haven, CT 06473 (203)

More information

Master of Science in Marketing Analytics (MSMA)

Master of Science in Marketing Analytics (MSMA) Master of Science in Marketing Analytics (MSMA) COURSE DESCRIPTION The Master of Science in Marketing Analytics program teaches students how to become more engaged with consumers, how to design and deliver

More information

Data Mining. Nonlinear Classification

Data Mining. Nonlinear Classification Data Mining Unit # 6 Sajjad Haider Fall 2014 1 Nonlinear Classification Classes may not be separable by a linear boundary Suppose we randomly generate a data set as follows: X has range between 0 to 15

More information

Artificial Neural Networks and Support Vector Machines. CS 486/686: Introduction to Artificial Intelligence

Artificial Neural Networks and Support Vector Machines. CS 486/686: Introduction to Artificial Intelligence Artificial Neural Networks and Support Vector Machines CS 486/686: Introduction to Artificial Intelligence 1 Outline What is a Neural Network? - Perceptron learners - Multi-layer networks What is a Support

More information

The Effects of Start Prices on the Performance of the Certainty Equivalent Pricing Policy

The Effects of Start Prices on the Performance of the Certainty Equivalent Pricing Policy BMI Paper The Effects of Start Prices on the Performance of the Certainty Equivalent Pricing Policy Faculty of Sciences VU University Amsterdam De Boelelaan 1081 1081 HV Amsterdam Netherlands Author: R.D.R.

More information

Class #6: Non-linear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris

Class #6: Non-linear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris Class #6: Non-linear classification ML4Bio 2012 February 17 th, 2012 Quaid Morris 1 Module #: Title of Module 2 Review Overview Linear separability Non-linear classification Linear Support Vector Machines

More information

EVALUATION AND MEASUREMENT IN MARKETING: TRENDS AND CHALLENGES

EVALUATION AND MEASUREMENT IN MARKETING: TRENDS AND CHALLENGES EVALUATION AND MEASUREMENT IN MARKETING: TRENDS AND CHALLENGES Georgine Fogel, Salem International University INTRODUCTION Measurement, evaluation, and effectiveness have become increasingly important

More information

New Work Item for ISO 3534-5 Predictive Analytics (Initial Notes and Thoughts) Introduction

New Work Item for ISO 3534-5 Predictive Analytics (Initial Notes and Thoughts) Introduction Introduction New Work Item for ISO 3534-5 Predictive Analytics (Initial Notes and Thoughts) Predictive analytics encompasses the body of statistical knowledge supporting the analysis of massive data sets.

More information

PROFITABLE CUSTOMER ENGAGEMENT Concepts, Metrics & Strategies

PROFITABLE CUSTOMER ENGAGEMENT Concepts, Metrics & Strategies PROFITABLE CUSTOMER ENGAGEMENT Concepts, Metrics & Strategies V. Kumar Dr V.Kumar Chapter 4 Valuing customer contributions The future looks green!!! Instructor s Presentation Slides 2 Traditional measures

More information

Customer lifetime value model in an online toy store

Customer lifetime value model in an online toy store J. Ind. Eng. Int., 7 (12), 19-31, Winter 2011 ISSN: 1735-5702 IAU, South Tehran Branch Customer lifetime value model in an online toy store B. Nikkhahan 1 ; A. Habibi Badrabadi 2 ; M.J. Tarokh 3* 1.2 Postgraduate

More information

Machine learning for algo trading

Machine learning for algo trading Machine learning for algo trading An introduction for nonmathematicians Dr. Aly Kassam Overview High level introduction to machine learning A machine learning bestiary What has all this got to do with

More information

An Introduction to Neural Networks

An Introduction to Neural Networks An Introduction to Vincent Cheung Kevin Cannons Signal & Data Compression Laboratory Electrical & Computer Engineering University of Manitoba Winnipeg, Manitoba, Canada Advisor: Dr. W. Kinsner May 27,

More information

Social Media Mining. Data Mining Essentials

Social Media Mining. Data Mining Essentials Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers

More information

How To Understand The Theory Of Probability

How To Understand The Theory Of Probability Graduate Programs in Statistics Course Titles STAT 100 CALCULUS AND MATR IX ALGEBRA FOR STATISTICS. Differential and integral calculus; infinite series; matrix algebra STAT 195 INTRODUCTION TO MATHEMATICAL

More information

An Overview of Knowledge Discovery Database and Data mining Techniques

An Overview of Knowledge Discovery Database and Data mining Techniques An Overview of Knowledge Discovery Database and Data mining Techniques Priyadharsini.C 1, Dr. Antony Selvadoss Thanamani 2 M.Phil, Department of Computer Science, NGM College, Pollachi, Coimbatore, Tamilnadu,

More information

Data Mining with SAS. Mathias Lanner [email protected]. Copyright 2010 SAS Institute Inc. All rights reserved.

Data Mining with SAS. Mathias Lanner mathias.lanner@swe.sas.com. Copyright 2010 SAS Institute Inc. All rights reserved. Data Mining with SAS Mathias Lanner [email protected] Copyright 2010 SAS Institute Inc. All rights reserved. Agenda Data mining Introduction Data mining applications Data mining techniques SEMMA

More information

Neural Networks for Sentiment Detection in Financial Text

Neural Networks for Sentiment Detection in Financial Text Neural Networks for Sentiment Detection in Financial Text Caslav Bozic* and Detlef Seese* With a rise of algorithmic trading volume in recent years, the need for automatic analysis of financial news emerged.

More information

Tutorial Customer Lifetime Value

Tutorial Customer Lifetime Value MARKETING ENGINEERING FOR EXCEL TUTORIAL VERSION 150211 Tutorial Customer Lifetime Value Marketing Engineering for Excel is a Microsoft Excel add-in. The software runs from within Microsoft Excel and only

More information

TNS EX A MINE BehaviourForecast Predictive Analytics for CRM. TNS Infratest Applied Marketing Science

TNS EX A MINE BehaviourForecast Predictive Analytics for CRM. TNS Infratest Applied Marketing Science TNS EX A MINE BehaviourForecast Predictive Analytics for CRM 1 TNS BehaviourForecast Why is BehaviourForecast relevant for you? The concept of analytical Relationship Management (acrm) becomes more and

More information

Silvermine House Steenberg Office Park, Tokai 7945 Cape Town, South Africa Telephone: +27 21 702 4666 www.spss-sa.com

Silvermine House Steenberg Office Park, Tokai 7945 Cape Town, South Africa Telephone: +27 21 702 4666 www.spss-sa.com SPSS-SA Silvermine House Steenberg Office Park, Tokai 7945 Cape Town, South Africa Telephone: +27 21 702 4666 www.spss-sa.com SPSS-SA Training Brochure 2009 TABLE OF CONTENTS 1 SPSS TRAINING COURSES FOCUSING

More information

Segmentation of stock trading customers according to potential value

Segmentation of stock trading customers according to potential value Expert Systems with Applications 27 (2004) 27 33 www.elsevier.com/locate/eswa Segmentation of stock trading customers according to potential value H.W. Shin a, *, S.Y. Sohn b a Samsung Economy Research

More information

Data Mining Practical Machine Learning Tools and Techniques

Data Mining Practical Machine Learning Tools and Techniques Ensemble learning Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 8 of Data Mining by I. H. Witten, E. Frank and M. A. Hall Combining multiple models Bagging The basic idea

More information

MERGING BUSINESS KPIs WITH PREDICTIVE MODEL KPIs FOR BINARY CLASSIFICATION MODEL SELECTION

MERGING BUSINESS KPIs WITH PREDICTIVE MODEL KPIs FOR BINARY CLASSIFICATION MODEL SELECTION MERGING BUSINESS KPIs WITH PREDICTIVE MODEL KPIs FOR BINARY CLASSIFICATION MODEL SELECTION Matthew A. Lanham & Ralph D. Badinelli Virginia Polytechnic Institute and State University Department of Business

More information

The Combination Forecasting Model of Auto Sales Based on Seasonal Index and RBF Neural Network

The Combination Forecasting Model of Auto Sales Based on Seasonal Index and RBF Neural Network , pp.67-76 http://dx.doi.org/10.14257/ijdta.2016.9.1.06 The Combination Forecasting Model of Auto Sales Based on Seasonal Index and RBF Neural Network Lihua Yang and Baolin Li* School of Economics and

More information

Joseph Twagilimana, University of Louisville, Louisville, KY

Joseph Twagilimana, University of Louisville, Louisville, KY ST14 Comparing Time series, Generalized Linear Models and Artificial Neural Network Models for Transactional Data analysis Joseph Twagilimana, University of Louisville, Louisville, KY ABSTRACT The aim

More information

Customer Lifetime Value Formula. Concepts, components and calculations involving CLV

Customer Lifetime Value Formula. Concepts, components and calculations involving CLV Customer Lifetime Value Formula Concepts, components and calculations involving CLV Table of Contents 1. Customer Lifetime Value... 3 2. Using present value of future cash flows in CLV... 5 3. Components

More information

Predictive Modeling Techniques in Insurance

Predictive Modeling Techniques in Insurance Predictive Modeling Techniques in Insurance Tuesday May 5, 2015 JF. Breton Application Engineer 2014 The MathWorks, Inc. 1 Opening Presenter: JF. Breton: 13 years of experience in predictive analytics

More information

DHL Data Mining Project. Customer Segmentation with Clustering

DHL Data Mining Project. Customer Segmentation with Clustering DHL Data Mining Project Customer Segmentation with Clustering Timothy TAN Chee Yong Aditya Hridaya MISRA Jeffery JI Jun Yao 3/30/2010 DHL Data Mining Project Table of Contents Introduction to DHL and the

More information

HYBRID PROBABILITY BASED ENSEMBLES FOR BANKRUPTCY PREDICTION

HYBRID PROBABILITY BASED ENSEMBLES FOR BANKRUPTCY PREDICTION HYBRID PROBABILITY BASED ENSEMBLES FOR BANKRUPTCY PREDICTION Chihli Hung 1, Jing Hong Chen 2, Stefan Wermter 3, 1,2 Department of Management Information Systems, Chung Yuan Christian University, Taiwan

More information

Artificial Neural Network, Decision Tree and Statistical Techniques Applied for Designing and Developing E-mail Classifier

Artificial Neural Network, Decision Tree and Statistical Techniques Applied for Designing and Developing E-mail Classifier International Journal of Recent Technology and Engineering (IJRTE) ISSN: 2277-3878, Volume-1, Issue-6, January 2013 Artificial Neural Network, Decision Tree and Statistical Techniques Applied for Designing

More information

Predict Influencers in the Social Network

Predict Influencers in the Social Network Predict Influencers in the Social Network Ruishan Liu, Yang Zhao and Liuyu Zhou Email: rliu2, yzhao2, [email protected] Department of Electrical Engineering, Stanford University Abstract Given two persons

More information

Customer Relationship Management

Customer Relationship Management V. Kumar Werner Reinartz Customer Relationship Management Concept, Strategy, and Tools ^J Springer Part I CRM: Conceptual Foundation 1 Strategic Customer Relationship Management Today 3 1.1 Overview 3

More information

Customer Analytics. Turn Big Data into Big Value

Customer Analytics. Turn Big Data into Big Value Turn Big Data into Big Value All Your Data Integrated in Just One Place BIRT Analytics lets you capture the value of Big Data that speeds right by most enterprises. It analyzes massive volumes of data

More information

Data quality in Accounting Information Systems

Data quality in Accounting Information Systems Data quality in Accounting Information Systems Comparing Several Data Mining Techniques Erjon Zoto Department of Statistics and Applied Informatics Faculty of Economy, University of Tirana Tirana, Albania

More information

An Introduction to Data Mining

An Introduction to Data Mining An Introduction to Intel Beijing [email protected] January 17, 2014 Outline 1 DW Overview What is Notable Application of Conference, Software and Applications Major Process in 2 Major Tasks in Detail

More information

Easily Identify Your Best Customers

Easily Identify Your Best Customers IBM SPSS Statistics Easily Identify Your Best Customers Use IBM SPSS predictive analytics software to gain insight from your customer database Contents: 1 Introduction 2 Exploring customer data Where do

More information

BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES

BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES 123 CHAPTER 7 BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES 7.1 Introduction Even though using SVM presents

More information

Data Analytical Framework for Customer Centric Solutions

Data Analytical Framework for Customer Centric Solutions Data Analytical Framework for Customer Centric Solutions Customer Savviness Index Low Medium High Data Management Descriptive Analytics Diagnostic Analytics Predictive Analytics Prescriptive Analytics

More information

Product Recommendation Based on Customer Lifetime Value

Product Recommendation Based on Customer Lifetime Value 2011 2nd International Conference on Networking and Information Technology IPCSIT vol.17 (2011) (2011) IACSIT Press, Singapore Product Recommendation Based on Customer Lifetime Value An Electronic Retailing

More information

Multichannel Marketing and Hidden Markov Models

Multichannel Marketing and Hidden Markov Models Multichannel Marketing and Hidden Markov Models Chun-Wei Chang A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy University of Washington 2012 Reading

More information

Statistics in Retail Finance. Chapter 6: Behavioural models

Statistics in Retail Finance. Chapter 6: Behavioural models Statistics in Retail Finance 1 Overview > So far we have focussed mainly on application scorecards. In this chapter we shall look at behavioural models. We shall cover the following topics:- Behavioural

More information

Management Science Letters

Management Science Letters Management Science Letters 4 (2014) 905 912 Contents lists available at GrowingScience Management Science Letters homepage: www.growingscience.com/msl Measuring customer loyalty using an extended RFM and

More information

MODELING CUSTOMER RELATIONSHIPS AS MARKOV CHAINS. Journal of Interactive Marketing, 14(2), Spring 2000, 43-55

MODELING CUSTOMER RELATIONSHIPS AS MARKOV CHAINS. Journal of Interactive Marketing, 14(2), Spring 2000, 43-55 MODELING CUSTOMER RELATIONSHIPS AS MARKOV CHAINS Phillip E. Pfeifer and Robert L. Carraway Darden School of Business 100 Darden Boulevard Charlottesville, VA 22903 Journal of Interactive Marketing, 14(2),

More information

OUTLIER ANALYSIS. Data Mining 1

OUTLIER ANALYSIS. Data Mining 1 OUTLIER ANALYSIS Data Mining 1 What Are Outliers? Outlier: A data object that deviates significantly from the normal objects as if it were generated by a different mechanism Ex.: Unusual credit card purchase,

More information

Decision Trees from large Databases: SLIQ

Decision Trees from large Databases: SLIQ Decision Trees from large Databases: SLIQ C4.5 often iterates over the training set How often? If the training set does not fit into main memory, swapping makes C4.5 unpractical! SLIQ: Sort the values

More information

Paper AA-08-2015. Get the highest bangs for your marketing bucks using Incremental Response Models in SAS Enterprise Miner TM

Paper AA-08-2015. Get the highest bangs for your marketing bucks using Incremental Response Models in SAS Enterprise Miner TM Paper AA-08-2015 Get the highest bangs for your marketing bucks using Incremental Response Models in SAS Enterprise Miner TM Delali Agbenyegah, Alliance Data Systems, Columbus, Ohio 0.0 ABSTRACT Traditional

More information

Stock Portfolio Selection using Data Mining Approach

Stock Portfolio Selection using Data Mining Approach IOSR Journal of Engineering (IOSRJEN) e-issn: 2250-3021, p-issn: 2278-8719 Vol. 3, Issue 11 (November. 2013), V1 PP 42-48 Stock Portfolio Selection using Data Mining Approach Carol Anne Hargreaves, Prateek

More information

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model 1 September 004 A. Introduction and assumptions The classical normal linear regression model can be written

More information

Numerical Algorithms Group

Numerical Algorithms Group Title: Summary: Using the Component Approach to Craft Customized Data Mining Solutions One definition of data mining is the non-trivial extraction of implicit, previously unknown and potentially useful

More information

White Paper. Data Mining for Business

White Paper. Data Mining for Business White Paper Data Mining for Business January 2010 Contents 1. INTRODUCTION... 3 2. WHY IS DATA MINING IMPORTANT?... 3 FUNDAMENTALS... 3 Example 1...3 Example 2...3 3. OPERATIONAL CONSIDERATIONS... 4 ORGANISATIONAL

More information

Introduction to Machine Learning Using Python. Vikram Kamath

Introduction to Machine Learning Using Python. Vikram Kamath Introduction to Machine Learning Using Python Vikram Kamath Contents: 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. Introduction/Definition Where and Why ML is used Types of Learning Supervised Learning Linear Regression

More information

Adaptive Demand-Forecasting Approach based on Principal Components Time-series an application of data-mining technique to detection of market movement

Adaptive Demand-Forecasting Approach based on Principal Components Time-series an application of data-mining technique to detection of market movement Adaptive Demand-Forecasting Approach based on Principal Components Time-series an application of data-mining technique to detection of market movement Toshio Sugihara Abstract In this study, an adaptive

More information

Location matters. 3 techniques to incorporate geo-spatial effects in one's predictive model

Location matters. 3 techniques to incorporate geo-spatial effects in one's predictive model Location matters. 3 techniques to incorporate geo-spatial effects in one's predictive model Xavier Conort [email protected] Motivation Location matters! Observed value at one location is

More information

Acknowledgments. Data Mining with Regression. Data Mining Context. Overview. Colleagues

Acknowledgments. Data Mining with Regression. Data Mining Context. Overview. Colleagues Data Mining with Regression Teaching an old dog some new tricks Acknowledgments Colleagues Dean Foster in Statistics Lyle Ungar in Computer Science Bob Stine Department of Statistics The School of the

More information

Linear Threshold Units

Linear Threshold Units Linear Threshold Units w x hx (... w n x n w We assume that each feature x j and each weight w j is a real number (we will relax this later) We will study three different algorithms for learning linear

More information

Lecture 6. Artificial Neural Networks

Lecture 6. Artificial Neural Networks Lecture 6 Artificial Neural Networks 1 1 Artificial Neural Networks In this note we provide an overview of the key concepts that have led to the emergence of Artificial Neural Networks as a major paradigm

More information

INCORPORATION OF LIQUIDITY RISKS INTO EQUITY PORTFOLIO RISK ESTIMATES. Dan dibartolomeo September 2010

INCORPORATION OF LIQUIDITY RISKS INTO EQUITY PORTFOLIO RISK ESTIMATES. Dan dibartolomeo September 2010 INCORPORATION OF LIQUIDITY RISKS INTO EQUITY PORTFOLIO RISK ESTIMATES Dan dibartolomeo September 2010 GOALS FOR THIS TALK Assert that liquidity of a stock is properly measured as the expected price change,

More information

Myth or Fact: The Diminishing Marginal Returns of Variable Creation in Data Mining Solutions

Myth or Fact: The Diminishing Marginal Returns of Variable Creation in Data Mining Solutions Myth or Fact: The Diminishing Marginal Returns of Variable in Data Mining Solutions Data Mining practitioners will tell you that much of the real value of their work is the ability to derive and create

More information

Dimensionality Reduction: Principal Components Analysis

Dimensionality Reduction: Principal Components Analysis Dimensionality Reduction: Principal Components Analysis In data mining one often encounters situations where there are a large number of variables in the database. In such situations it is very likely

More information

How To Identify A Churner

How To Identify A Churner 2012 45th Hawaii International Conference on System Sciences A New Ensemble Model for Efficient Churn Prediction in Mobile Telecommunication Namhyoung Kim, Jaewook Lee Department of Industrial and Management

More information

JetBlue Airways Stock Price Analysis and Prediction

JetBlue Airways Stock Price Analysis and Prediction JetBlue Airways Stock Price Analysis and Prediction Team Member: Lulu Liu, Jiaojiao Liu DSO530 Final Project JETBLUE AIRWAYS STOCK PRICE ANALYSIS AND PREDICTION 1 Motivation Started in February 2000, JetBlue

More information

SUMAN DUVVURU STAT 567 PROJECT REPORT

SUMAN DUVVURU STAT 567 PROJECT REPORT SUMAN DUVVURU STAT 567 PROJECT REPORT SURVIVAL ANALYSIS OF HEROIN ADDICTS Background and introduction: Current illicit drug use among teens is continuing to increase in many countries around the world.

More information

Chapter 7: Data Mining

Chapter 7: Data Mining Chapter 7: Data Mining Overview Topics discussed: The Need for Data Mining and Business Value The Data Mining Process: Define Business Objectives Get Raw Data Identify Relevant Predictive Variables Gain

More information

A Comparative Study of the Pickup Method and its Variations Using a Simulated Hotel Reservation Data

A Comparative Study of the Pickup Method and its Variations Using a Simulated Hotel Reservation Data A Comparative Study of the Pickup Method and its Variations Using a Simulated Hotel Reservation Data Athanasius Zakhary, Neamat El Gayar Faculty of Computers and Information Cairo University, Giza, Egypt

More information

Web Appendix OPERATIONALIZATION OF CUSTOMER METRICS FOR NETFLIX AND VERIZON WIRELESS

Web Appendix OPERATIONALIZATION OF CUSTOMER METRICS FOR NETFLIX AND VERIZON WIRELESS 1 Web Appendix OPERATIONALIZATION OF CUSTOMER METRICS FOR NETFLIX AND VERIZON WIRELESS Netflix reports all required customer metrics (profit contribution, acquisition cost, retention cost, and retention

More information

Leveraging Ensemble Models in SAS Enterprise Miner

Leveraging Ensemble Models in SAS Enterprise Miner ABSTRACT Paper SAS133-2014 Leveraging Ensemble Models in SAS Enterprise Miner Miguel Maldonado, Jared Dean, Wendy Czika, and Susan Haller SAS Institute Inc. Ensemble models combine two or more models to

More information

Enhanced Boosted Trees Technique for Customer Churn Prediction Model

Enhanced Boosted Trees Technique for Customer Churn Prediction Model IOSR Journal of Engineering (IOSRJEN) ISSN (e): 2250-3021, ISSN (p): 2278-8719 Vol. 04, Issue 03 (March. 2014), V5 PP 41-45 www.iosrjen.org Enhanced Boosted Trees Technique for Customer Churn Prediction

More information

On the Use of Continuous Duration Models to Predict Customer Churn in the ADSL Industry in Portugal

On the Use of Continuous Duration Models to Predict Customer Churn in the ADSL Industry in Portugal On the Use of Continuous Duration Models to Predict Customer Churn in the ADSL Industry in Portugal Abstract Customer churn has been stated as one of the main reasons of profitability losses in the telecommunications

More information

Measuring Customer Lifetime Value: Models and Analysis

Measuring Customer Lifetime Value: Models and Analysis Measuring Customer Lifetime Value: Models and Analysis Siddarth S. SINGH Dipak C. JAIN 2013/27/MKT Measuring Customer Lifetime Value: Models and Analysis Siddharth S. Singh* Dipak C. Jain** * Associate

More information

THE HYBRID CART-LOGIT MODEL IN CLASSIFICATION AND DATA MINING. Dan Steinberg and N. Scott Cardell

THE HYBRID CART-LOGIT MODEL IN CLASSIFICATION AND DATA MINING. Dan Steinberg and N. Scott Cardell THE HYBID CAT-LOGIT MODEL IN CLASSIFICATION AND DATA MINING Introduction Dan Steinberg and N. Scott Cardell Most data-mining projects involve classification problems assigning objects to classes whether

More information

A Procedure for Classifying New Respondents into Existing Segments Using Maximum Difference Scaling

A Procedure for Classifying New Respondents into Existing Segments Using Maximum Difference Scaling A Procedure for Classifying New Respondents into Existing Segments Using Maximum Difference Scaling Background Bryan Orme and Rich Johnson, Sawtooth Software March, 2009 Market segmentation is pervasive

More information

Java Modules for Time Series Analysis

Java Modules for Time Series Analysis Java Modules for Time Series Analysis Agenda Clustering Non-normal distributions Multifactor modeling Implied ratings Time series prediction 1. Clustering + Cluster 1 Synthetic Clustering + Time series

More information

Data Mining - Evaluation of Classifiers

Data Mining - Evaluation of Classifiers Data Mining - Evaluation of Classifiers Lecturer: JERZY STEFANOWSKI Institute of Computing Sciences Poznan University of Technology Poznan, Poland Lecture 4 SE Master Course 2008/2009 revised for 2010

More information

Self Organizing Maps: Fundamentals

Self Organizing Maps: Fundamentals Self Organizing Maps: Fundamentals Introduction to Neural Networks : Lecture 16 John A. Bullinaria, 2004 1. What is a Self Organizing Map? 2. Topographic Maps 3. Setting up a Self Organizing Map 4. Kohonen

More information

A Hybrid Model of Data Mining and MCDM Methods for Estimating Customer Lifetime Value. Malaysia

A Hybrid Model of Data Mining and MCDM Methods for Estimating Customer Lifetime Value. Malaysia A Hybrid Model of Data Mining and MCDM Methods for Estimating Customer Lifetime Value Amir Hossein Azadnia a,*, Pezhman Ghadimi b, Mohammad Molani- Aghdam a a Department of Engineering, Ayatollah Amoli

More information

Data Mining Part 5. Prediction

Data Mining Part 5. Prediction Data Mining Part 5. Prediction 5.1 Spring 2010 Instructor: Dr. Masoud Yaghini Outline Classification vs. Numeric Prediction Prediction Process Data Preparation Comparing Prediction Methods References Classification

More information