Customer Churn Prediction, Segmentation and Fraud Detection in Telecommunication Industry
|
|
|
- Arnold Nelson
- 9 years ago
- Views:
Transcription
1 Customer Churn Prediction, Segmentation and Fraud Detection in Telecommunication Industry Ahsan Rehman 1, Abbas Raza Ali 2 Advanced Analytics and Big Data 1, Advanced Analytics and Big Data 2 IBM Pakistan 1, IBM United Kingdom 2 [email protected] 1, [email protected] 2 Abstract Every telecommunication market player is launching innovative business models and offering better services which in turn have increased the cost of retaining customers. Realizing the importance of customer retention, service providers are now putting more efforts in prediction and prevention of customer churn. This work being divided into two phases, with phase one presenting some of the commonly used data-mining techniques for the identification of churners and customer segmentation based on various Key Performance Indicators (KPIs). While phase two phase discusses how social network analysis extracts relationships between different subscribers to improve the results produced by the traditional learning algorithms at individual subscriber level. Satisfactory results have been achieved from Base model which were significantly enhanced by applying the social network analysis technique. Keywords: Call Detail Record; Churn Prediction; Customer Segmentation; Fraud Detection; Social Network Analytics. 1. Introduction Over the last few decades, mobile telecommunication has emerged as the dominant medium of communication across the world. In several countries, market saturation has reached a level where every potential customer has to be won over from competitors. At the same time, standardization of mobile infrastructure and public regulation allowed the customer to port easily from one network to another, resulting in a fluid market. Since the cost of winning a new customer is greater than the cost of retaining an existing one [1], mobile carriers have been shifting their attention from customer acquisition to customer retention. As a result churn prediction and prevention have become one of the most crucial Business Analytic applications aimed at identifying potential customers who are about to transfer their business to a competitor (i.e. churn) [2]. A good churn prediction system should not only identify the potential churners successfully, but also provide reasons for their churn and forecast such results for a sufficiently long horizon, i.e. six months. Once potential customers are identified as likely to churn, the marketing and retention department tries to retain their business by luring them with attractive and well-designed campaigns. Thus, a long forecast horizon is an obvious advantage because the further away the customer is from actually making the decision to churn, the easier it is to prevent the decision at a significantly lower cost. Since the retention efforts are constrained due to limited resources a fraction of the subscriber-base can be contacted at any given time. With this constraint in mind, churn prediction models are usually measured by their ability to identify actual churners amongst the top 0.1% to 5% of the customers predicted that have the highest risk of churning. The telecom operators wanted to divide the existing revenue generating subscriber-base into multiple segments with distinct usage patterns. The base needs to be grouped efficiently using clustering models. Each segment needs to have its own mean values [3]. This allows the service providers to target customers in particular segments with better offers, thereby resulting in improved customer satisfaction and revenue generation. The ever increasing competition for a larger customer base means setting tougher sales targets for which multiple steps are taken to promote new activations. Commission on new subscriber activation is one major incentive which has given rise to fraudulent sales. Such sales involve no activity and therefore need to be identified and separated from normal churn scoring process. The traditional learning algorithms process each subscriber as an independent entity. This results in hiding the inherent relationship between subscribers. A subscribers Call Detail Record (CDR) contains various information pertaining to each call, e.g., caller, called, timestamp of the call etc. Based on this information a call graph, known as Social Network Analysis (SNA), can be constructed with customer mobile numbers as identity of nodes and the calls as edges. The weight of an edge captures the strength of the relationship between two nodes, e.g. call duration, frequency, etc. [12]. In this study, SNA has been used to model this relationship of the subscribers and embed it on top of the traditional base-model in order to enhance overall accuracy. The subsequent sections of the paper are organized as follows. Section 2 briefly discusses existing churn prediction and customer segmentation systems. Section 3 describes business challenges faced during this research. Section 4 gives an overview of gathering and preparing data from data-sources. Section 5 presents detailed analysis performed on input data to make it compatible for modeling, with some preliminary analysis that led to final system methodology, which is discussed in Section 6, supported by results that are recorded in Section 7. Finally, the paper is concluded in Section 8 followed by possible directions for the future in Section Literature Review This section reviews different research approaches related to churn prediction and customer segmentation with a focus on the kind of techniques used to solve business problems pertinent to Telecom companies worldwide. Gopal et al [5] has used ordinal regression for customer churn time prediction for modeling tenure of mobile customers. The dataset used in this work consisted of 100,000 customers with 169 independent features. At minimum 7 months old customers were included in the sample and the study period consisted of 25 months, with a total of 5 respective ranks. Top 50 features were picked for modeling, with dataset being randomly partitioned into two sets. Ordinal regression produced 86.21% accuracy while multi-class ASE 2014 ISBN:
2 classification gave an accuracy of 83.8%. Later it was compared with Survival Analysis technique, a state of the art method for tenure modeling which was able to capture 20% of churners in the top deciles (7 to 11 months period) while ordinal regression was able to capture 20-45% of churners in the top deciles (25 months period). Another research to predict customer churn, [8] applied Survival Analysis techniques to predict customer churn and compared it with the existing conventional statistical methods. The technique is aimed to help telecom service providers understand customer churn risk and its hazard in a timely manner. The objectives of this study were twofold: 1) to estimate customer survival and customer hazard function in order to gain knowledge of customer churn and 2) to demonstrate how Survival Analysis techniques can be used to identify high risk customer and their expected time of churn. The dataset used consisted of about 40,000 active high-value customers, who were randomly selected from the entire customer-base over a period of 15 months. Final results were quite promising where 90% churners were correctly identified. Another study for predicting customer churn in mobile networks, [11] used SNA technique called Group First Churn Prediction. This technique exploited the structure of customer interactions to predict which groups of subscribers are most prone to churn. Further, it used second order social metrics to analyse interactions within each group that helped in identifying social leaders. Two datasets from different service providers were used; one dataset used 7 days call data consisting of 16 million instances while the second dataset used 28 days call data consisting of CDRs of 26 million subscribers. The results were integrated to a prior churn score, which significantly improved the operator s previous results. In another study, social ties and their relevance to churn in mobile telecom networks, [12] used simple diffusion-process which exploits social influences affecting churn. Their diffusion model was based on Spreading Activation (SPA) which examines millions of mobile phone users patterns. This technique predicts potential churners by examining the current set of churners and their underlying social network. The dataset used in this work consisted of aggregated call usage, frequency and duration of each user over the span of one month. After performing several experiments the best predictor was obtained by using spreading factor of 0.72 and SPA was successful in making correct predictions about 50-60% of future churners. In the domain of Customer segmentation for telecom companies, a study was based on service usage behavior [3] which discussed various clustering techniques to understand customer behavior. Customers were clustered using k-means based on their CDR and categorized in four loyalty groups. By focusing on call duration of 6 months data with the value of k as 5, it labeled each cluster according to its distinct characteristic. The results defined a threshold for loyal and revenue generating customers in terms of active time on network, where 14% churners were identified. 3. Business Challenges During the last decade, a rapid growth in the telecom industry has led to a bigger subscriber-base for service providers and handheld phones are becoming the dominant communication medium worldwide. The telecom operator discussed in this study has a subscriberbase closer to 20 million. The operator maintains a Data-Warehouse (DWH) which consolidates the appropriate transactional and customer data records and is planning to use it for various analyses. The key challenge is to build a data-mining application for churn prediction that can reduce gross churn rate by 5% for both post-paid and pre-paid subscribers. Another major challenge stems from the current prepaid subscriber-base, compromised of different segments. The customers have been consuming different packages and bundles according to their usage. These packages offer unlimited calls, messages and Value Added Service (VAS) hence the subscriber-base was quite stretched with user s average spending ranging from $1 to $12 per month. The operator wanted to identify distinct segments according to subscribers behavioral and usage attributes. The operator started off its operations with a subscriber-base of less than a million and planned to capture a substantial market share in the first few years. That target was achieved by setting aggressive sales targets and offering lucrative incentives to the sales and distribution channels. Consequently, while the subscriptions were rapidly increasing, a new problem emerged known as fraudulent sales. The fraudulent sales phenomenon is described as the consumption of free credit which comes on new subscriber activation and is valid for a few days, but is never recharged afterwards. Hence no revenue is generated from the particular new sales activation; on the other hand the sales channels earn lucrative commissions on it. Currently the operator plans to build fraudulent sales prediction model which can predict subscribers who will not be generating revenue after the initial activation period and also identifying sales channels that are involved in massive fraudulent sales. 4. Data-Sources A variety of data-sources are required to build churn prediction and customer segmentation models including customer demographic, geographic information and call detail records, etc. In the context of current work, very limited customer demographic and geographic information was available, as most of data-sources comprise of basic information i.e., CDR, operations, etc. 4.1 Data Gathering For this research the training dataset was gathered from three main data-sources namely, 1) customer call detail record, 2) call center contact details and 3) customer personal information. Table 1 shows detailed list of features from these data-sources. Most of the features are continuous, except a few, i.e. package plan, city, region, franchise, etc. The granularity of source data is at customer level with daily transaction details. According to the retention policy of the telecom operator, a total of 6 months prepaid and 12 months postpaid historical transactional data is being maintained. The total prepaid base is around 20 million, postpaid is around 0.1 million and on daily basis around 20,000 new subscribers join the network. From the 20,000 new subscribers, around 25% utilizes the initial free balance offered with new activation for the first 7 days and then never recharge, hence churn. Table 1: Data-sources and their features Features Customer call detail record 1. Outgoing and incoming call usage, minutes, and revenue 2. Outgoing and incoming short message service (SMS) usage and revenue 3. VAS details i.e. internet (GPRS), interactive voice response (IVR), ringback tones, Multimedia Message Service (MMS) etc. usage and revenue 4. Recharge count and amount 5. Current package plan details Call center contact details 6. Customers call center contact details which include call duration and ASE 2014 ISBN:
3 count. Customer personal information 7. Customer location including city and region 8. Customer purchase details including franchise and default package 9. Customer gender 4.2 Data preparation is an important process dealing with a number of key issues related to the organizational data environment such as data quality, availability, loading etc. Given the importance of such issues a structured approach of the data preparation process was adapted in this work. For each predictive model, one single structured table was prepared for building, assessing and scoring models in a real-time environment. The existing features were later used to build various derived features, which were able to give a better meaning to the customer behavior. The following derived features added a whole new dimension for predictive modeling: 1) Active days: total revenue generating days 2) Average call distance: time between two consecutive calls 3) Maximum inactive days: consecutive number of inactivity days 4) Network Age: scoring date - activation date Initially, the data for the prepaid and post-paid models was limited to 6 months, but later it was observed that post-paid had a very small base (0.1 million). However, the predictive modeling demands more records to prepare a comprehensive rule-set. So, 12 months historic data was gathered for post-paid. The prepaid model already had sufficient records (20 million), therefore, just six months data was used. While for early churn model where data was limited to seven days, predictive modeling was a challenge. Keeping in mind the DWH retention policy, a maximum of 7 months historic data was extracted and later used for early churn modeling. 5. Data Preprocessing and Analysis This section discusses transformations that have been applied to make input data compatible for predictive modeling. The data was filtered to predict accurate results, as the raw data usually includes instances which are loosely controlled and can have out-of-range and missing values. Therefore the representation and quality of data was enhanced before further analysis. Data pre-processing involves a number of steps, which can take a considerable amount of time and effort. These steps are explained as follows: 5.1 Data Audit This process helps in analyzing the quality of dataset features. It provides a comprehensive first look at the raw data which is often helpful for the initial data exploration. The descriptive statistics of three datasets are listed in Table 2. This table lists the key features of each dataset with respect to four statistical measures. Mean indicates the average distribution for each feature while Standard Deviation gives a precise measure of the spread of data. Skewness indicates the degree to which a features distribution departs from symmetry about its mean value and outlier percentage indicates if features have any extreme values. An initial analysis of each dataset has been driven after this step (Bold items in Table 2). Table 2: Data Audit Features Mean Std. Dev. Skewness Outlier (%) Early Churn Total incoming minutes Total incoming revenue Total outgoing minutes Total outgoing revenue Total VAS revenue Active days Average call distance Total inactive days Post-paid Active days Average call distance Maximum inactive days Network age days Outgoing calls minutes avg Outgoing calls revenue avg VAS revenue average Revenue line-rent average Payment amount average Prepaid Active days Average call distance Total inactive days Network age days Outgoing calls minutes avg Outgoing calls revenue avg VAS revenue average Recharge value average Total count of recharge Sparseness Elimination The initial data-audit phase (see Table 2) highlighted that the datasets had lots of missing values. The key features of prepaid and post-paid included outgoing calls minute average, outgoing calls revenue average, VAS revenue average, recharge value average and total count of recharge. While for the early churn dataset all key features had missing values. After carefully analyzing the DWH business rules the sparseness was eliminated from the input datasets by replacing missing values with zeroes. This balanced out the odd effect and gave a smooth transitional pattern. Sparseness eliminated data also provided key insights about the early churn model, which will be discussed in the later sections of this paper Outlier Detection and Fixation Another observation from the initial data-audit phase was the percentage of outliers against every feature needed to be carefully handled. This process identified top 1% records having outliers or extreme values and could in turn affect the overall accuracy of the predictive model. These records (top 1%) were strictly adhered to avoid minimum number being excluded from the scoring process. Rank based anomaly detection algorithm [16] was used to search for unusual cases based on deviations from the norms of their cluster groups. It is designed to quickly detect unusual cases for dataauditing purposes in the exploratory data analysis step, prior to any inferential data analysis. This algorithm can be divided into three stages: 1) Modeling based on the similarities of input feature set (shown in Table 2), cases are placed into cluster groups. The cluster groups are then identified using clustering model (Groups for datasets in Table 3) while sufficient statistics are used to calculate the norms of the cluster groups 2) Scoring model is applied to each case to identify its cluster groups and some indices are created for each case to measure the unusualness of the case with respect to its cluster group. All cases are sorted by the values of the anomaly indices and the top portion of the case list is identified as the set of anomalies ASE 2014 ISBN:
4 3) Reasoning for each anomalous case where the variables are sorted by its corresponding variable deviation indices. The top variables, their values and the corresponding norm values are presented as the reasons why a case was identified as an anomaly. Table 3 lists three groups for post-paid and early churn datasets while prepaid dataset has 10 groups which indicate subscriber segments with distinct attribute properties. The anomaly detection process generated a model which was later used to detect anomalous records in scoring data based on patterns found in the original training data. Table 3: Outliers Grouping Groups Early Churn Prepaid Postpaid 1 834,288 4,128,817 20, ,165,790 1,765,918 11, ,785,386 1,545,174 6, ,572, ,740, ,226, ,060, ,732, ,718, ,113, Correlation Analysis During the initial data preparation, a total of 125 features were loaded for data pre-processing. After initial data audit (Table 2), sparseness elimination and outlier s removal (Table 3) the next step was feature selection, i.e. use correlation to identify the top features having a strong relationship with the target variable (churn status). The correlation measures the strength of relationship between target and independent fields with values ranging between 1.0 and 1.0. Values close to +1.0 indicate a strong positive association so that attributes have high relationship with target. Values close to 1.0 indicate a strong negative association so that high values for features are associated with low values in target and vice versa. Values close to zero indicate a weak association so the values for the two fields are more or less independent. In Table 4 the correlation strength for labels has been computed by importance (1 - p), where p is defined as significance/probability that the difference in means could be explained by chance alone. Table 4 shows top six attributes for the three datasets in which most of them have a strong relationship with the target. Active day was a key feature which proved vital for all three churn prediction models. After correlation analysis we were able to deduce top 25 features which were later used in churn prediction modeling. Table 4: Correlation Analysis Features Early Churn Prepaid Postpaid Active days 0.50 (Strong) (Strong) (Strong) Total incoming minutes 0.14 (Strong) (Strong) (Strong) Total incoming revenue 0.2 (Strong) 0.02 (Strong) (Strong) Total outgoing minutes 0.42 (Strong) 0.07 (Strong) (Medium) Total outgoing revenue 0.24 (Strong) 0.07 (Strong) 0.01 (Strong) Total VAS revenue 0.35 (Strong) (Weak) (Strong) 5.5. Data Balancing After applying initial pre-processing steps, the dataset still had one major problem of imbalance classes which imposed another level of difficulty for data-mining algorithms to extract meaningful patterns from the data. In case of imbalanced classes in the dataset, the ratio of sizes of output categories becomes biased to the extent that the learning algorithm only predicts the majority class in results [9]. For example, the post-paid dataset consists of 31,964 (94%) active subscribers, whereas there were only 2,203 (6%) churners, which presents a typical case of imbalance classes. One of the methods to deal with this problem is re-sampling which can be applied in two ways: 1) under-sampling [7] and 2) over-sampling [6]. In case of pre-paid and post-paid datasets, a random selection of entries from the active customers (around 10%) were removed to balance out the ratio of subscribers who would churn versus the subscribers who would stay active to 16:84 respectively. This is a case of random under-sampling. On the contrary, over-sampling increases the strength of minority class by replicating a random selection of this class, which can result in overfitting Data Quality and Analysis This section covers the detailed analysis of pre-processed datasets of each model separately and highlights useful insights. Early Churn/Fraud Detection: After the pre-processing of the early churn historical data, it was deduced that a total of 670,000 (18%) subscribers out of 3,700,000 had zero activity during the first 7 days of subscription. These zero activity subscribers were already part of the subscriber-base but were distinctively identified after preprocessing of the data. Figure 1 shows the users with zero active days which were discovered after sparseness elimination. Figure 1: Pre-processed early churn data The above finding was actually an enhancement to the scoring model where all the false activations being made by distributors to earn commission based on total activations were eliminated. These subscribers were producing noise in the data which were later removed from the early churn prediction model. Secondly, these subscribers were tagged to franchise for further action, i.e. eliminate such sales. Post-paid Churn: The pre-processed post-paid dataset highlighted certain insights, which required necessary action. It can be observed from Figure 2 that most of the spikes are where subscribers have high outgoing call revenue with high total recharge and network age days being very low. These subscribers, not being part of the major postpaid subscribers, could badly influence the modeling process. Further analysis concluded that a total of 36 high usage and revenue generating numbers were present in the data which were non-churn corporate numbers and had to be excluded from the list as to balance the overall mean of the subscriber-base. Pre-paid Churn: The pre-processed dataset for pre-paid subscribers has been grouped into 10 distinct clusters (as shown in Table 3) ASE 2014 ISBN:
5 which indicate that the overall subscriber-base activities were quite diverse. Further analysis of these segments revealed that such high diversification was due to the lucrative packages with multiple bundle offers being offered by the telecom operator. These bundles could be subscribed on daily, weekly or monthly basis and offered unlimited calls, messages and VAS services. Subscribers were using these bundles with non-consistent subscription routine and due to such varying usage their activity was split into high number of segments. Figure 2: Post-paid dataset Another behavior in this pre-paid base was of customers who generated outgoing revenue activity for a very limited time period. These subscribers had low active days and high maximum inactive days, which meant they were using the services of telecom provider after a gap of every few weeks. On further analyzing these reactivations it was discovered that these subscribers resumed services after getting extensive promotional offers every few weeks for its dormant subscriber base. After extensive study of all such subscriber behaviors in these 10 segments, the pre-paid dataset was split into two models. The first model included all those subscribers which satisfied the following conditions: 1) Active days > 75 2) Total inactive days < 60 3) Maximum Inactive days < 30 4) Total Recharge > 100 Figure 3: Prepaid dataset The second model included all those subscribers which were discarded by first model. All active subscribers having high active days and recharge can be seen in Figure 3, while subscribers with low active days and recharge are mostly churners. The challenge was to predict subscribers who have less number of active days but do not churn, and this was taken care of in the second model. Customer Segmentation Model: The customer segmentation model was specifically designed for pre-paid subscriber-base (see Table 2). After rigorous analysis on source data (from Table 1), 44 distinct features where extracted which proved to be vital in producing distinct customer segments. The segmentation model built on these features could now split the subscriber-base according to combinations of average revenue per user (ARPU), dormancy, gender, package and usage which allowed handling each segment as per its subscribers usage pattern. Social Network Analysis Model: Social Network Analysis (SNA) is applied on top of traditional base models to extract relationships between the subscribers. The traditional models process each subscriber as an individual entity, which hides the impact of a subscriber s churn on other connected users. SNA based model consisted of 3 major features having churn details of last 90 day s subscribers activity: 1) Caller Number: The person making the call 2) Called Number: The person being called 3) Duration of Call: The duration of call (in minutes) 4) Target variable: List of churned subscribers Using these features, a network or graph is built where the caller and called numbers are presented as nodes and duration of calls are the weights of the edges. There was some basic pre-processing applied on the network to minimize the noise, i.e. eliminating the edges with less than 1 minute call (such calls were found to be 10% of all the calls made in a day on an average), discard calls where called number information was missing (such calls were found to be 1% of the daily calls), etc. 6. Methodology The methodology section explains the steps involved in building the churn prediction, customer segmentation, and early churn (fraud detection) models. Churn prediction, being a supervised learning model, has the target variable (churn/no-churn) available in the training data. This model is further divided into two parts: 1) data modeling and 2) scoring. On the contrary customer segmentation model initially constituted of customer segments based on the 44 features (which are discussed in Section 4.4) and later used to generate monthly subscriber segments as well as track whether the subscriber revenue generation activity has had any effect. In this case study there are three different models developed for churn prediction, with each model having its own set of subscriberbase. For postpaid churn model, the 0.1 million subscribers have been split according to their billing cycle and all subscribers who have been active in the last 12 months are included in the model s base. For prepaid churn model, the subscribers who have been active in the last 6 months are included in the model s base. The scoring of prepaid model has been scheduled on weekly basis while the postpaid model is scheduled on the basis of billing cycle. The resulting churn scores are later used to design new campaigns for subscribers. The historical and predicted windows of pre-paid and post-paid model are shown in Figure 4. The historical data consisted of multiple attributes aggregated on a monthly basis against every distinct subscriber. Historical data window is followed by a 14 days marketing window where all subscribers are performing outgoing revenue generating activity. This gap has been kept as per the service ASE 2014 ISBN:
6 provider s marketing requirements, to ensure that the subscriber is active in this period and can be pitched with a retention campaign. The marketing window was followed by a 30 days predictive window where the subscriber s churn status is marked as inactive if no outgoing activity is found during this period, which indicates the start of their dormancy period. Furthermore, the dormancy period was followed by a 90 days inactivity window to indicate that the subscriber actually churned and moved out physically from his connection. On the other hand, the subscribers who were carrying out any activity (i.e. even if they are only receiving incoming off-net calls, and not making outgoing calls) till the end of the predictive window would have an active status and can be marked as no-churn at the end of the inactivity window. In the modeling phase, the churn model is trained to predict the subscribers churn status using only the The subscriber-base comprises of newly activated customers, who as policy of the telecom are eligible to consume free balance in the first 7 days of activation with no recharge activity during those 7 days. The subscriber s recharge status (target variable) is derived from the next 90 days beyond the modeling window. This model will predict newly activated subscribers who haven t recharged in the first seven days and their activation credit has been confiscated (i.e., defrauded) after the 7 th day of activation. Figure 6: Early Churn (fraud detection) Model Figure 4: Pre-paid and post-paid model historical data window. Thus, on every scoring cycle the model predicted subscribers churn status till the state of the predictive window. This churn status enabled the marketing team to target campaigns on subscribers especially those who became inactive during the predictive window. Figure 5 shows different windows sizes for fraudulent sales/early churn model design. The historical data-modeling window included 7 days aggregated data with multiple features for distinct subscribers. For early churn (fraud detection) modeling, 7 months dataset for newly activated subscribers with no recharge in the first 7 days was extracted from the DWH. For each subscriber, a 7 day aggregated data along with some behavioral features were used to build the model. From the pre-processed data, the subscribers with new activations who had not performed any revenue generation activity were identified. These activations were having no data available because of zero activity, therefore were excluded from the modeling data and were declared as fraudulent sales. The system architecture with detailed steps used to perform modeling and scoring for churn prediction and customer segmentation is shown in Figure 6. The telecom service provider is maintaining its transactional and usage data in a DWH whose sources are Customer Relationship Management (CRM) and Mobile Customer Relationship Management Data Warehouse <Call, Revenue, VAS> Cubes Mobile Switching Centre Churn Prediction Customer Segmentation Data Modeling Customised Pl/SQL Procedures to extract subscriber data Training Data <Outgoing, Incoming> Call, SMS & VAS Details 1. Data Audit 2. Correlation Analysis 3. Sparseness Elimination 4. Outliers Detection 5. Feature Transformation Data Modeling Customised Pl/SQL Procedures to extract subscriber data Data Modelling 1. K-Means 2. Two-Step Segmentation Model Optimal Clusters with mean values of attributes Data model conversion to PMML format Data Models Rule set Scoring Load Data Models Data model conversion to PMML format Customised Pl/SQL procedures to extract scoring data Churn Rule Set Decision Tree for complete subscriber base Scoring Data <Outgoing, Incoming> Call, SMS & VAS Details Data Modelling 1. Classification & Regression Trees 2. CHAID 3. C Neural Networks 5. Quaternion Estimator (QUEST) Quarterly basis Same steps as followed in data modelling Segmentation Data <Usage, Revenue, City, Package> Details Scoring Load Data Models 1. Data Audit 2. Correlation Analysis 3. Sparseness Elimination 4. Feature Transformation Customised Pl/SQL procedures to extract scoring data Scoring Data <Usage, Revenue, City, Package> Details Data Models Rule set Quarterly basis Same steps as followed in data modelling Data Model conversion from PMML format Data Model to generate churn score for each subscriber Ranked Churn Scores Prioritising churn scores Data Model conversion from PMML format Data Model to generate segments for subscribers Customer Segments Tracking subscriber segment On demand On demand Churn Reduction Campaigns Campaign for High Churn score subscribers Data Warehouse <Subscribers Churn Score, Segments> Executive Dashboards Churn rate tracker, VAS and call churn rate analysis Figure 5: Detailed architecture of churn prediction and customer segmentation models ASE 2014 ISBN:
7 Customer Relationship Management Data Warehouse <Call, Revenue, VAS> Cubes Mobile Switching Centre 2014 ASE BigData/SocialInformatics/PASSAT/BioMedCom 2014 Conference, Harvard University, December 14-16, 2014 Switching Centre (MSC), which are updated on daily basis. The DWH is also maintaining daily, weekly and monthly level aggregated summary tables on top of transactional records. The data is initially extracted from DWH through customized PL/SQL procedures for each subscriber. These procedures are executed in various query blocks and consolidated at the later stage to produce a single table with unique subscriber number and aggregated records as per model requirement, i.e. on monthly-basis for pre-paid and post-paid models while weekly aggregation for early churn. The aggregated data includes subscribers usage and revenue information of outgoing and incoming calls, short message service (SMS), VAS, and some derived features. This data is pre-processed following the steps that are discussed in Section 5. For all three models the raw data was initially audited to identify the refinements required to minimize noise from the data and then partitioned into training and testing sets for data modeling and evaluation purpose. The training set comprises of 70% instances while the testing set composed of remaining dataset. The training data is used to build the model and generates different patterns which are later used to evaluate the model using testing set. Data modeling has a pool of algorithms for churn prediction consisting of regression-based, decision trees (CHAID, C5.0, QUEST & C&R Tree) and Neural Networks algorithms. These algorithms are used to build models on training set where their accuracies are compared with each other and the model with maximum accuracy is selected and converted to Predictive Model Markup Language (PMML). The PMML formatted model is later used to score churn probabilities on quarterly basis and updated in DWH. These scores are ranked to identify the subscribers with highest churn score and reason of churn which is used to pitch campaigns on predictable churners. The customer segmentation model, shown in Figure 6, is different from the churn prediction, as it has no target value associated with it. The customers are grouped into clusters with similar spending patterns and behaviors which allow the telecom service provider to focus on the more appropriate product and service offerings for a particular subscriber. The pre-processed data is used for customer segmentation modeling, where different features, i.e., usage, revenue, active days, etc. are used to form distinct clusters of subscribers. The modeling process uses two clustering algorithms: 1) K-means, and 2) Two-step [17, 18] where algorithm which generates highest accuracy is selected. These segments are passed to marketing team as to design different campaigns for the potential churners. The model is scheduled to execute on monthly basis to ensure the maintenance of positive revenue segment tracking. Social Network Analytics: Social network analysis was used to analyze the relationship between the subscribers to form the interrelated groups that cannot be extracted from the traditional techniques. The traditional data-mining algorithms use process the subscribers data as i.i.d. (independent and identical distributed) which ignores relationship (call-in and out) between the subscribers which is a key in this context. Also it is observed that the traditional algorithms are not able to compute accurate predictions for the least and most frequent callers. Mostly the least frequent callers are predicted as churners due to insufficient calling patterns and revenue generation, while in actual these callers are not churners. On the other extreme the most frequent callers are usually predicted as no-churn by the traditional models but in actual these subscribers can churn. As a result the most and least frequent subscribers are excluded from the traditional churn analysis to social network analytics on those set of subscribers. SNA is further divided into two techniques: 1) Diffusion analysis [12] and 2) Group analysis. Diffusion analysis is applied on the data to predict churn while group analysis is used to segment customers. Figure 7 shows the detailed architecture of SNA based customer segmentation and churn prediction. Churn Prediction using Diffusion Analysis: A subscriber s calling pattern has an impact on other peer nodes in its social structure. The Churn Prediction Customised Pl/SQL Procedures to extract subscriber data Training Data <Caller, Callee> Duration 1. Instance Filtration 2. Sparseness elimination 3. Feature Transformation Ranked Churn Scores Ranking based on Diffusion Energy Diffusion Analysis KPIs 1. Node ID 2. Diffusion Energy 3. In/Out Degree 4. Weighted In/Out Degree Data Modelling Social Network Diffusion Analysis Scheduled on quarterly basis Customer Segmentation Customised Pl/SQL Procedures to extract subscriber data Segmentation Data <Caller, Callee> Duration 1. Instance Filtration 2. Sparseness elimination 3. Feature Transformation Segmentation Ranking based on Group Leader Confidence level Data Modelling 1. Social Network Group Analysis Group Analysis KPIs 1. Group ID 2. Size 3. Density 4. Ranking 5. In/Out Degree 6. Group Leader On demand Executive Dashboards Churn rate tracker, VAS and call churn rate analysis Data Warehouse <Subscribers Churn Score, Segments> Churn Reduction Campaigns Campaign for High Churn score subscribers Fig. 7. Detailed architecture diagram of Social Network Analytics ASE 2014 ISBN:
8 diffusion analysis technique is specifically applied on such individuals who are affected due to the churners in the group. This technique uses four features extracted from the DWH CDR table which are discussed in Section 5. The caller and called numbers of CDR are used to form nodes of the network while duration of calls represents weights of the edges. Using this network, diffusion analysis technique generates a set of KPIs, i.e. diffusion energy which is used as churn score, number of in and out degree for calls etc. The subscriber churn scores are ranked and the top scorers (having high diffusion energy) are marked as probable churners. These subscribers are to be immediately targeted with effective campaigns and offers. Customer Segmentation using Group Analysis: Group analysis technique is applied on CDR to form customer segments where the node (which represents a subscriber) containing the maximum in and out edges (calls) is marked as leader of that group. This model was initially supposed to target group leaders as they are highly influential over other group members. These group leaders are subscribed for service offerings and their influence helps in promoting those offerings to wider group members likely to subscribe similar offering. Another objective of this model is trying to attract a group leader from a telecom competitor. If this leader successfully ports-out, it can also help in increasing the churn rate of the group members associated with the particular competitor as well as reducing the churn rate of group members associated with this telecom service provider. The group analysis algorithm uses this data to generate a set of KPIs which include size, density, ranking for each group based on incoming and outgoing calls (degree), and call duration (weight) to identify group leaders. 7. Results The accuracy of churn prediction model is computed using a confusion matrix where the actual and predicted churn values are used to predict churn accuracy. The accuracy of customer segmentation model is computed using Silhouette measure of Cohesion and Separation [10] measure. The churn prediction model has been evaluated on traditional (base) algorithms initially which are reported in Table 5. Additionally, SNA were applied on the instances which were unable to predict correctly using base algorithms (most and least frequent callers) to boost the overall accuracy of the system. The SNA technique shows a boost of 10% on pre-paid and 8% on post-paid models. Table 5: Comparison of Machine Learning algorithms used for churn prediction Algorithms Scores of traditional approaches Early-churn (%) Pre-paid (%) Post-paid (%) C C&R Tree CHAID QUEST Social Network Analytics Boost Total Accuracy Similarly, traditional clustering algorithms are used as base models where the results of one of the segmentation algorithms is sufficient enough to meet the target of at least 0.5 silhouettes of cohesion and separation [10] (as shown in Table 6). Table 6: Comparison of Unsupervised Learning algorithms used for Customer Segmentation Algorithms Scores of traditional approaches Silhouette Number of Clusters Largest Cluster (%) K-Means Two-Step The total clusters, shown in Figure 8, still account for a large set of subscribers who were impossible to be targeted at the same time, and therefore the top subscribers are listed using SNA group analysis technique. The results list top 300 subscribers who are social leaders with high ARPU. A high influence over several other subscribers (means) they can easily spread new offers and promotions to their peers, thereby saving operator s expenses. Figure 8: Segments and their proportions generated with K-means algorithm 8. Discussion and Conclusion This research work proposed a combination of traditional and SNA based approach to the conventional telecom problems, churn prediction and customer segmentation. The telecom services provider discussed here were facing high churn rate and fraudulent sales problems resulting loss in terms of revenue and subscriber-base to the operator. The problems required some advanced analytical techniques to minimize these problems. Every data-mining problem requires sufficient amount of historical data to build predictive models but the available data consisted of limited number of features containing noisy data. This data was pre-processed using in-depth analysis to minimize noise and ensure better understanding of customer behavior. On the other hand, the traditional learning algorithms were not able to find the relationships between different subscribers which extract useful patterns for churn prediction and customer segmentation models. As a result, SNA based algorithms were used on top of the base (traditional) models to incorporate multi-dimensional analysis and to boost the overall accuracy of the system. 9. Future Work The analysis of the telecom data using different data-mining techniques was particularly aimed to predict subscribers churn behavior, finding leaders and churn effected subscribers in a group, improve customer relationship management, and develop various campaigns strategies for customer retention and loyalty. The local telecom market offers lucrative offers to subscribers with easy carrier switching offer, therefore customers needed to be profiled into specific groups and only targeted campaigns and packages would allow such customers to be retained. In the next step using more features such as call quality, customer complaint types including severity level and resolution time can further help in boosting the performance of churn prediction models. Also the demographics information of the subscribers can further help in segmenting customers in a much efficient and accurate way. ASE 2014 ISBN:
9 Using text analytics for converting unstructured data, i.e. customer feedback, to structured data can further improve the models. These enhancements could add a new dimension in customer churn and segmentation models where customer behavior and attitude can be learnt and predicted. Acknowledgment The author wishes to thank Sanket Jain, Zunaira Rasheed, Shamyl Bin Mansoor and Syed Yasir Hassan for taking part in subject discussion and review of this work. References [1] J. Hadden, A. Tiwari, R. Roy, and D. Ruta, Computer Assisted Customer Churn Management: State-of-the-Art and Future Trends, Journal of Computers and Operations Research, vol. 34, issue 10, Oct. 2007, pp [2] R. Fildes, Telecommunications Demand Forecasting - A Review, International Journal of Forecasting, vol. 18, 2002, pp [3] S. Aheleroff and M. R. Gholamian, Customer Segmentation For a Mobile Telecommunications Company Based on Service Usage Behavior, in Proceedings of the 3rd International Conference on Data Mining and Intelligent Information Technology Applications, Macau, China, Oct. 2011, pp [4] Z. Zhongding, M. Xuemei and L. Guangcan, Customer Segmentation Algorithm of Wireless Content Service Based on Ant K-Means, in Proceedings of the 2009 International Forum on Computer Science- Technology and Applications, vol. 1, 2009, Pages [5] R. K. Gopal and S. K. Meher, Customer Churn Time Prediction in Mobile Telecommunication Industry Using Ordinal Regression, in Proceedings of the 12 th Pacific-Asia conference on Advances in knowledge discovery and data mining, 2008, pp [6] Vicente Garcia, J. Salvador Sanchez, Ramon A. Mollineda, Roberto Alejo, and Jose M.Sotoca. The class imbalance problem in pattern classification and learning, in Proceedings of Francisco J. Ferrer- Troyano et al, editor, II Congreso Espanol de Informatica, 2007, pages [7] X. Y. Liu, J. Wu, and Z. H. Zhou. Exploratory under-sampling for class imbalance learning, In ICDM IEEE Computer Society, 2006, pages [8] J. Lu, Predicting Customer Churn in the Telecommunications Industry - An Application of Survival Analysis Modeling Using SAS, in AUGI 27, Orlando, Florida, Apr. 2002, pp [9] C. Drummond and R. C. Holtel. Severe class imbalance: Why better algorithms arent the answer, in Proceedings of 16 th European Conference of Machine Learning, [10] P. J. Rousseeuw. Silhouettes: a Graphical Aid to the Interpretation and Validation of Cluster Analysis, Journal of Computational and Applied Mathematics, 1987, pages [11] Y. Richter, E. Yom-Tov, and N. Slonim, Predicting Customer Churn in Mobile Networks through Analysis of Social Groups, in Proceedings of the SIAM International Conference on Data Mining (SDM), Columbus, Ohio, 2010, pp [12] K. Dasgupta, R. Singh, B Viswanathan, D. Chakraborty, S. Mukherjea, A. A. Nanavati and A. Joshi, Social Ties and their Relevance to Churn in Mobile Telecom Networks, in Proceedings of 11 th Conference on Extending Database Technology, Nantes, France, Mar [13] Pushpa and G. Shobha, An Efficient Method of Building the Telecom Social Network for Churn Prediction, International Journal of Data Mining & Knowledge Management Process (IJDKP), vol. 2, issue 3, May 2012, pp [14] E. Xevelonakis and P. Som, The impact of social network-based segmentation on customer loyalty in the telecommunication industry, Journal of Database Marketing & Customer Strategy Management, vol. 19, issue 2, May 2012, pp [15] K. Coussement and D. Van den Poel, Churn Prediction in Subscription Services: An Application of Support Vector Machines while Comparing Two Parameter-Selection Techniques, Journal of Expert Systems with Application, vol. 34, issue 1, Jan. 2008, pp [16] H. Huang, Rank Based Anomaly Detection Algorithms. Electrical Engineering and Computer Science Dissertations, Paper 331. [17] T. Zhang, R. Ramakrishnon, and M. Livny. BIRCH: An efficient data clustering method for very large databases. In: Proceedings of the ACM SIGMOD Conference on Management of Data, 1996, Montreal, Canada. [18] T. Chiu, D. Fang, J. Chen, Y. Wang, and C. Jeris. A Robust and Scalable Clustering Algorithm for Mixed Type Attributes in Large Database Environment. In: Proceedings of the seventh ACM SIGKDD international conference on knowledge discovery and data mining, San Francisco, CA. Ahsan Rehman is a Data Analyst working at IBM Business Analytics and Optimization, Global Business Services. He received his B.S. degree in Information Technology from the National University of Sciences and Technology in His current research interests include predictive analytics, social network analytics and big data. Abbas R. Ali is a Data Scientist working at IBM Business Analytics and Optimization Center of Competence. He received his B.S. degree in computer science and mathematics from the Institute of Management Sciences in 2004, M.S. degree in artificial intelligence and natural language processing from the National University of Computers and Emerging Sciences in 2009 and currently doing his PhD in machine learning and predictive analytics from Bournemouth University. His current area of research is Meta-level Learning in the Context of Multi-component, Multi-level Evolving Predictive Systems. ASE 2014 ISBN:
DORMANCY PREDICTION MODEL IN A PREPAID PREDOMINANT MOBILE MARKET : A CUSTOMER VALUE MANAGEMENT APPROACH
DORMANCY PREDICTION MODEL IN A PREPAID PREDOMINANT MOBILE MARKET : A CUSTOMER VALUE MANAGEMENT APPROACH Adeolu O. Dairo and Temitope Akinwumi Customer Value Management Department, Segments and Strategy
CHURN PREDICTION IN MOBILE TELECOM SYSTEM USING DATA MINING TECHNIQUES
International Journal of Scientific and Research Publications, Volume 4, Issue 4, April 2014 1 CHURN PREDICTION IN MOBILE TELECOM SYSTEM USING DATA MINING TECHNIQUES DR. M.BALASUBRAMANIAN *, M.SELVARANI
Using reporting and data mining techniques to improve knowledge of subscribers; applications to customer profiling and fraud management
Using reporting and data mining techniques to improve knowledge of subscribers; applications to customer profiling and fraud management Paper Jean-Louis Amat Abstract One of the main issues of operators
Analyzing Customer Churn in the Software as a Service (SaaS) Industry
Analyzing Customer Churn in the Software as a Service (SaaS) Industry Ben Frank, Radford University Jeff Pittges, Radford University Abstract Predicting customer churn is a classic data mining problem.
DIGITS CENTER FOR DIGITAL INNOVATION, TECHNOLOGY, AND STRATEGY THOUGHT LEADERSHIP FOR THE DIGITAL AGE
DIGITS CENTER FOR DIGITAL INNOVATION, TECHNOLOGY, AND STRATEGY THOUGHT LEADERSHIP FOR THE DIGITAL AGE INTRODUCTION RESEARCH IN PRACTICE PAPER SERIES, FALL 2011. BUSINESS INTELLIGENCE AND PREDICTIVE ANALYTICS
Customer Classification And Prediction Based On Data Mining Technique
Customer Classification And Prediction Based On Data Mining Technique Ms. Neethu Baby 1, Mrs. Priyanka L.T 2 1 M.E CSE, Sri Shakthi Institute of Engineering and Technology, Coimbatore 2 Assistant Professor
Using Data Mining for Mobile Communication Clustering and Characterization
Using Data Mining for Mobile Communication Clustering and Characterization A. Bascacov *, C. Cernazanu ** and M. Marcu ** * Lasting Software, Timisoara, Romania ** Politehnica University of Timisoara/Computer
Enhanced Boosted Trees Technique for Customer Churn Prediction Model
IOSR Journal of Engineering (IOSRJEN) ISSN (e): 2250-3021, ISSN (p): 2278-8719 Vol. 04, Issue 03 (March. 2014), V5 PP 41-45 www.iosrjen.org Enhanced Boosted Trees Technique for Customer Churn Prediction
CRM Analytics for Telecommunications
CRM Analytics for Telecommunications The WAR Framework Dr. Paulo Costa Data Mining & CRM for Telecom Industry IBM Global Service [email protected] Contents The Telecommunications Industry Market WAR The
SOCIAL NETWORK ANALYSIS EVALUATING THE CUSTOMER S INFLUENCE FACTOR OVER BUSINESS EVENTS
SOCIAL NETWORK ANALYSIS EVALUATING THE CUSTOMER S INFLUENCE FACTOR OVER BUSINESS EVENTS Carlos Andre Reis Pinheiro 1 and Markus Helfert 2 1 School of Computing, Dublin City University, Dublin, Ireland
Data Mining Solutions for the Business Environment
Database Systems Journal vol. IV, no. 4/2013 21 Data Mining Solutions for the Business Environment Ruxandra PETRE University of Economic Studies, Bucharest, Romania [email protected] Over
not possible or was possible at a high cost for collecting the data.
Data Mining and Knowledge Discovery Generating knowledge from data Knowledge Discovery Data Mining White Paper Organizations collect a vast amount of data in the process of carrying out their day-to-day
Data Analytical Framework for Customer Centric Solutions
Data Analytical Framework for Customer Centric Solutions Customer Savviness Index Low Medium High Data Management Descriptive Analytics Diagnostic Analytics Predictive Analytics Prescriptive Analytics
Deriving Call Data Record Insights through Self Service BI Reporting
Deriving Call Data Record Insights through Self Service BI Reporting The Need for Business Intelligence BI assists corporate managers and decision makers to make relevant, accurate, timely and smart decision
Data Mining Applications in Higher Education
Executive report Data Mining Applications in Higher Education Jing Luan, PhD Chief Planning and Research Officer, Cabrillo College Founder, Knowledge Discovery Laboratories Table of contents Introduction..............................................................2
Microsoft Business Analytics Accelerator for Telecommunications Release 1.0
Frameworx 10 Business Process Framework R8.0 Product Conformance Certification Report Microsoft Business Analytics Accelerator for Telecommunications Release 1.0 November 2011 TM Forum 2011 Table of Contents
Clustering Marketing Datasets with Data Mining Techniques
Clustering Marketing Datasets with Data Mining Techniques Özgür Örnek International Burch University, Sarajevo [email protected] Abdülhamit Subaşı International Burch University, Sarajevo [email protected]
Churn Prediction in MMORPGs: A Social Influence Based Approach
Churn Prediction in MMORPGs: A Social Influence Based Approach Jaya Kawale Dept of Computer Science & Engg University Of Minnesota Minneapolis, MN 55455 Email: [email protected] Aditya Pal Dept of Computer
Social Media Mining. Data Mining Essentials
Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers
Data Mining Techniques in CRM
Data Mining Techniques in CRM Inside Customer Segmentation Konstantinos Tsiptsis CRM 6- Customer Intelligence Expert, Athens, Greece Antonios Chorianopoulos Data Mining Expert, Athens, Greece WILEY A John
Working with telecommunications
Working with telecommunications Minimizing churn in the telecommunications industry Contents: 1 Churn analysis using data mining 2 Customer churn analysis with IBM SPSS Modeler 3 Types of analysis 3 Feature
STATISTICA. Financial Institutions. Case Study: Credit Scoring. and
Financial Institutions and STATISTICA Case Study: Credit Scoring STATISTICA Solutions for Business Intelligence, Data Mining, Quality Control, and Web-based Analytics Table of Contents INTRODUCTION: WHAT
Network Interactions in Mobile Networks
Predicting Consumer Choices Through Analysis of Interactions in Social Networks Todor Krastevich * Summary: Analysis of interactions in social networks has emerged as a new research paradigm in modern
Data Mining: An Introduction
Data Mining: An Introduction Michael J. A. Berry and Gordon A. Linoff. Data Mining Techniques for Marketing, Sales and Customer Support, 2nd Edition, 2004 Data mining What promotions should be targeted
Minimize customer churn with analytics
IBM Software Business Analytics Telecommunications Minimize customer churn with analytics Understand who s likely to churn and take action with IBM software 2 Minimize customer churn with analytics Contents
W H I T E P A P E R. Real Time Marketing Connecting with Customers at the Moment of Truth. 2014 LUMATA All Rights Reserved
W H I T E P A P E R Real Time Marketing Connecting with Customers at the Moment of Truth R E A L - T I M E M A R K E T I N G Today, consumers are facing an unprecedented level of 'noise' generated by marketing
SPATIAL DATA CLASSIFICATION AND DATA MINING
, pp.-40-44. Available online at http://www. bioinfo. in/contents. php?id=42 SPATIAL DATA CLASSIFICATION AND DATA MINING RATHI J.B. * AND PATIL A.D. Department of Computer Science & Engineering, Jawaharlal
An Overview of Knowledge Discovery Database and Data mining Techniques
An Overview of Knowledge Discovery Database and Data mining Techniques Priyadharsini.C 1, Dr. Antony Selvadoss Thanamani 2 M.Phil, Department of Computer Science, NGM College, Pollachi, Coimbatore, Tamilnadu,
Churn Prediction. Vladislav Lazarov. Marius Capota. [email protected]. [email protected]
Churn Prediction Vladislav Lazarov Technische Universität München [email protected] Marius Capota Technische Universität München [email protected] ABSTRACT The rapid growth of the market
Random forest algorithm in big data environment
Random forest algorithm in big data environment Yingchun Liu * School of Economics and Management, Beihang University, Beijing 100191, China Received 1 September 2014, www.cmnt.lv Abstract Random forest
A STUDY OF DATA MINING ACTIVITIES FOR MARKET RESEARCH
205 A STUDY OF DATA MINING ACTIVITIES FOR MARKET RESEARCH ABSTRACT MR. HEMANT KUMAR*; DR. SARMISTHA SARMA** *Assistant Professor, Department of Information Technology (IT), Institute of Innovation in Technology
DATA MINING TECHNIQUES AND APPLICATIONS
DATA MINING TECHNIQUES AND APPLICATIONS Mrs. Bharati M. Ramageri, Lecturer Modern Institute of Information Technology and Research, Department of Computer Application, Yamunanagar, Nigdi Pune, Maharashtra,
How To Identify A Churner
2012 45th Hawaii International Conference on System Sciences A New Ensemble Model for Efficient Churn Prediction in Mobile Telecommunication Namhyoung Kim, Jaewook Lee Department of Industrial and Management
Index Contents Page No. Introduction . Data Mining & Knowledge Discovery
Index Contents Page No. 1. Introduction 1 1.1 Related Research 2 1.2 Objective of Research Work 3 1.3 Why Data Mining is Important 3 1.4 Research Methodology 4 1.5 Research Hypothesis 4 1.6 Scope 5 2.
Pulsar TRAC. Big Social Data for Research. Made by Face
Pulsar TRAC Big Social Data for Research Made by Face PULSAR TRAC is an advanced social intelligence platform designed for researchers and planners by researchers and planners. We have developed a robust
Applying Sonamine Social Network Analysis To Telecommunications Marketing. An introductory whitepaper
Applying Sonamine Social Network Analysis To Telecommunications Marketing An introductory whitepaper Introduction Social network analysis (SNA) uses information about the relationships between customers
Banking Analytics Training Program
Training (BAT) is a set of courses and workshops developed by Cognitro Analytics team designed to assist banks in making smarter lending, marketing and credit decisions. Analyze Data, Discover Information,
Data Mining: Overview. What is Data Mining?
Data Mining: Overview What is Data Mining? Recently * coined term for confluence of ideas from statistics and computer science (machine learning and database methods) applied to large databases in science,
Applying Customer Attitudinal Segmentation to Improve Marketing Campaigns Wenhong Wang, Deluxe Corporation Mark Antiel, Deluxe Corporation
Applying Customer Attitudinal Segmentation to Improve Marketing Campaigns Wenhong Wang, Deluxe Corporation Mark Antiel, Deluxe Corporation ABSTRACT Customer segmentation is fundamental for successful marketing
20 A Visualization Framework For Discovering Prepaid Mobile Subscriber Usage Patterns
20 A Visualization Framework For Discovering Prepaid Mobile Subscriber Usage Patterns John Aogon and Patrick J. Ogao Telecommunications operators in developing countries are faced with a problem of knowing
Easily Identify Your Best Customers
IBM SPSS Statistics Easily Identify Your Best Customers Use IBM SPSS predictive analytics software to gain insight from your customer database Contents: 1 Introduction 2 Exploring customer data Where do
IBM SPSS Modeler Social Network Analysis 15 User Guide
IBM SPSS Modeler Social Network Analysis 15 User Guide Note: Before using this information and the product it supports, read the general information under Notices on p. 25. This edition applies to IBM
Understanding Your Customer Journey by Extending Adobe Analytics with Big Data
SOLUTION BRIEF Understanding Your Customer Journey by Extending Adobe Analytics with Big Data Business Challenge Today s digital marketing teams are overwhelmed by the volume and variety of customer interaction
IBM SPSS Direct Marketing 23
IBM SPSS Direct Marketing 23 Note Before using this information and the product it supports, read the information in Notices on page 25. Product Information This edition applies to version 23, release
Data Mining with SAS. Mathias Lanner [email protected]. Copyright 2010 SAS Institute Inc. All rights reserved.
Data Mining with SAS Mathias Lanner [email protected] Copyright 2010 SAS Institute Inc. All rights reserved. Agenda Data mining Introduction Data mining applications Data mining techniques SEMMA
Knowledge Discovery and Data Mining. Structured vs. Non-Structured Data
Knowledge Discovery and Data Mining Unit # 2 1 Structured vs. Non-Structured Data Most business databases contain structured data consisting of well-defined fields with numeric or alphanumeric values.
Digging for Gold: Business Usage for Data Mining Kim Foster, CoreTech Consulting Group, Inc., King of Prussia, PA
Digging for Gold: Business Usage for Data Mining Kim Foster, CoreTech Consulting Group, Inc., King of Prussia, PA ABSTRACT Current trends in data mining allow the business community to take advantage of
A Knowledge Management Framework Using Business Intelligence Solutions
www.ijcsi.org 102 A Knowledge Management Framework Using Business Intelligence Solutions Marwa Gadu 1 and Prof. Dr. Nashaat El-Khameesy 2 1 Computer and Information Systems Department, Sadat Academy For
Dr. U. Devi Prasad Associate Professor Hyderabad Business School GITAM University, Hyderabad Email: [email protected]
96 Business Intelligence Journal January PREDICTION OF CHURN BEHAVIOR OF BANK CUSTOMERS USING DATA MINING TOOLS Dr. U. Devi Prasad Associate Professor Hyderabad Business School GITAM University, Hyderabad
Mobile Phone APP Software Browsing Behavior using Clustering Analysis
Proceedings of the 2014 International Conference on Industrial Engineering and Operations Management Bali, Indonesia, January 7 9, 2014 Mobile Phone APP Software Browsing Behavior using Clustering Analysis
Role of Customer Response Models in Customer Solicitation Center s Direct Marketing Campaign
Role of Customer Response Models in Customer Solicitation Center s Direct Marketing Campaign Arun K Mandapaka, Amit Singh Kushwah, Dr.Goutam Chakraborty Oklahoma State University, OK, USA ABSTRACT Direct
Revenue Enhancement and Churn Prevention
Revenue Enhancement and Churn Prevention for Telecom Service Providers A Telecom Event Analytics Framework to Enhance Customer Experience and Identify New Revenue Streams www.wipro.com Anindito De Senior
DATA MINING TECHNIQUES SUPPORT TO KNOWLEGDE OF BUSINESS INTELLIGENT SYSTEM
INTERNATIONAL JOURNAL OF RESEARCH IN COMPUTER APPLICATIONS AND ROBOTICS ISSN 2320-7345 DATA MINING TECHNIQUES SUPPORT TO KNOWLEGDE OF BUSINESS INTELLIGENT SYSTEM M. Mayilvaganan 1, S. Aparna 2 1 Associate
TNS EX A MINE BehaviourForecast Predictive Analytics for CRM. TNS Infratest Applied Marketing Science
TNS EX A MINE BehaviourForecast Predictive Analytics for CRM 1 TNS BehaviourForecast Why is BehaviourForecast relevant for you? The concept of analytical Relationship Management (acrm) becomes more and
Potential Value of Data Mining for Customer Relationship Marketing in the Banking Industry
Advances in Natural and Applied Sciences, 3(1): 73-78, 2009 ISSN 1995-0772 2009, American Eurasian Network for Scientific Information This is a refereed journal and all articles are professionally screened
Predicting Churn. A SAS White Paper
A SAS White Paper Table of Contents Introduction......................................................................... 1 The Price of Churn...................................................................
Sales and Invoice Management System with Analysis of Customer Behaviour
Sales and Invoice Management System with Analysis of Customer Behaviour Sanam Kadge Assistant Professor, Uzair Khan Arsalan Thange Shamail Mulla Harshika Gupta ABSTRACT Today, the organizations advertise
International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014
RESEARCH ARTICLE OPEN ACCESS A Survey of Data Mining: Concepts with Applications and its Future Scope Dr. Zubair Khan 1, Ashish Kumar 2, Sunny Kumar 3 M.Tech Research Scholar 2. Department of Computer
Data Mining System, Functionalities and Applications: A Radical Review
Data Mining System, Functionalities and Applications: A Radical Review Dr. Poonam Chaudhary System Programmer, Kurukshetra University, Kurukshetra Abstract: Data Mining is the process of locating potentially
BIDM Project. Predicting the contract type for IT/ITES outsourcing contracts
BIDM Project Predicting the contract type for IT/ITES outsourcing contracts N a n d i n i G o v i n d a r a j a n ( 6 1 2 1 0 5 5 6 ) The authors believe that data modelling can be used to predict if an
White Paper. How Streaming Data Analytics Enables Real-Time Decisions
White Paper How Streaming Data Analytics Enables Real-Time Decisions Contents Introduction... 1 What Is Streaming Analytics?... 1 How Does SAS Event Stream Processing Work?... 2 Overview...2 Event Stream
Use of Data Mining in Banking
Use of Data Mining in Banking Kazi Imran Moin*, Dr. Qazi Baseer Ahmed** *(Department of Computer Science, College of Computer Science & Information Technology, Latur, (M.S), India ** (Department of Commerce
Single Level Drill Down Interactive Visualization Technique for Descriptive Data Mining Results
, pp.33-40 http://dx.doi.org/10.14257/ijgdc.2014.7.4.04 Single Level Drill Down Interactive Visualization Technique for Descriptive Data Mining Results Muzammil Khan, Fida Hussain and Imran Khan Department
Why is Internal Audit so Hard?
Why is Internal Audit so Hard? 2 2014 Why is Internal Audit so Hard? 3 2014 Why is Internal Audit so Hard? Waste Abuse Fraud 4 2014 Waves of Change 1 st Wave Personal Computers Electronic Spreadsheets
Introduction to Data Mining
Introduction to Data Mining Jay Urbain Credits: Nazli Goharian & David Grossman @ IIT Outline Introduction Data Pre-processing Data Mining Algorithms Naïve Bayes Decision Tree Neural Network Association
IMPROVING DATA INTEGRATION FOR DATA WAREHOUSE: A DATA MINING APPROACH
IMPROVING DATA INTEGRATION FOR DATA WAREHOUSE: A DATA MINING APPROACH Kalinka Mihaylova Kaloyanova St. Kliment Ohridski University of Sofia, Faculty of Mathematics and Informatics Sofia 1164, Bulgaria
In this presentation, you will be introduced to data mining and the relationship with meaningful use.
In this presentation, you will be introduced to data mining and the relationship with meaningful use. Data mining refers to the art and science of intelligent data analysis. It is the application of machine
COPYRIGHTED MATERIAL. Contents. List of Figures. Acknowledgments
Contents List of Figures Foreword Preface xxv xxiii xv Acknowledgments xxix Chapter 1 Fraud: Detection, Prevention, and Analytics! 1 Introduction 2 Fraud! 2 Fraud Detection and Prevention 10 Big Data for
IBM SPSS Direct Marketing 22
IBM SPSS Direct Marketing 22 Note Before using this information and the product it supports, read the information in Notices on page 25. Product Information This edition applies to version 22, release
Torquex Customer Engagement Analytics. End to End View of Customer Interactions and Operational Insights
Torquex Customer Engagement Analytics End to End View of Customer Interactions and Operational Insights Rob Witthoft Torquex {Pty) Ltd 10/1/2015 Torquex Customer Engagement Analytics Torquex Customer Engagement
Cleaned Data. Recommendations
Call Center Data Analysis Megaputer Case Study in Text Mining Merete Hvalshagen www.megaputer.com Megaputer Intelligence, Inc. 120 West Seventh Street, Suite 10 Bloomington, IN 47404, USA +1 812-0-0110
BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES
BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES 123 CHAPTER 7 BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES 7.1 Introduction Even though using SVM presents
Customer Analytics. Turn Big Data into Big Value
Turn Big Data into Big Value All Your Data Integrated in Just One Place BIRT Analytics lets you capture the value of Big Data that speeds right by most enterprises. It analyzes massive volumes of data
Role of Social Networking in Marketing using Data Mining
Role of Social Networking in Marketing using Data Mining Mrs. Saroj Junghare Astt. Professor, Department of Computer Science and Application St. Aloysius College, Jabalpur, Madhya Pradesh, India Abstract:
Mining Telecommunication Networks to Enhance Customer Lifetime Predictions
Mining Telecommunication Networks to Enhance Customer Lifetime Predictions Aimée Backiel 1, Bart Baesens 1,2, and Gerda Claeskens 1 1 Faculty of Economics and Business, KU Leuven, Belgium {aimee.backiel,bart.baesens,gerda.claeskens}@kuleuven.be
Data Mining is sometimes referred to as KDD and DM and KDD tend to be used as synonyms
Data Mining Techniques forcrm Data Mining The non-trivial extraction of novel, implicit, and actionable knowledge from large datasets. Extremely large datasets Discovery of the non-obvious Useful knowledge
Data Warehousing and Data Mining in Business Applications
133 Data Warehousing and Data Mining in Business Applications Eesha Goel CSE Deptt. GZS-PTU Campus, Bathinda. Abstract Information technology is now required in all aspect of our lives that helps in business
Azure Machine Learning, SQL Data Mining and R
Azure Machine Learning, SQL Data Mining and R Day-by-day Agenda Prerequisites No formal prerequisites. Basic knowledge of SQL Server Data Tools, Excel and any analytical experience helps. Best of all:
Sentiment Analysis on Big Data
SPAN White Paper!? Sentiment Analysis on Big Data Machine Learning Approach Several sources on the web provide deep insight about people s opinions on the products and services of various companies. Social
COMP3420: Advanced Databases and Data Mining. Classification and prediction: Introduction and Decision Tree Induction
COMP3420: Advanced Databases and Data Mining Classification and prediction: Introduction and Decision Tree Induction Lecture outline Classification versus prediction Classification A two step process Supervised
WHITE PAPER. Social media analytics in the insurance industry
WHITE PAPER Social media analytics in the insurance industry Introduction Insurance is a high involvement product, as it is an expense. Consumers obtain information about insurance from advertisements,
Research of Postal Data mining system based on big data
3rd International Conference on Mechatronics, Robotics and Automation (ICMRA 2015) Research of Postal Data mining system based on big data Xia Hu 1, Yanfeng Jin 1, Fan Wang 1 1 Shi Jiazhuang Post & Telecommunication
DATA MINING TECHNOLOGY. Keywords: data mining, data warehouse, knowledge discovery, OLAP, OLAM.
DATA MINING TECHNOLOGY Georgiana Marin 1 Abstract In terms of data processing, classical statistical models are restrictive; it requires hypotheses, the knowledge and experience of specialists, equations,
Churn Management - The Colour of Money (*)
Churn Management - The Colour of Money (*) Carole MANERO IDATE, Montpellier, France R etaining customers is one of the most critical challenges in the maturing mobile telecommunications service industry.
The Data Mining Process
Sequence for Determining Necessary Data. Wrong: Catalog everything you have, and decide what data is important. Right: Work backward from the solution, define the problem explicitly, and map out the data
SAS Fraud Framework for Banking
SAS Fraud Framework for Banking Including Social Network Analysis John C. Brocklebank, Ph.D. Vice President, SAS Solutions OnDemand Advanced Analytics Lab SAS Fraud Framework for Banking Agenda Introduction
Customer Relationship Management using Adaptive Resonance Theory
Customer Relationship Management using Adaptive Resonance Theory Manjari Anand M.Tech.Scholar Zubair Khan Associate Professor Ravi S. Shukla Associate Professor ABSTRACT CRM is a kind of implemented model
Paper AA-08-2015. Get the highest bangs for your marketing bucks using Incremental Response Models in SAS Enterprise Miner TM
Paper AA-08-2015 Get the highest bangs for your marketing bucks using Incremental Response Models in SAS Enterprise Miner TM Delali Agbenyegah, Alliance Data Systems, Columbus, Ohio 0.0 ABSTRACT Traditional
Beyond listening Driving better decisions with business intelligence from social sources
Beyond listening Driving better decisions with business intelligence from social sources From insight to action with IBM Social Media Analytics State of the Union Opinions prevail on the Internet Social
Data Mining for Everyone
Page 1 Data Mining for Everyone Christoph Sieb Senior Software Engineer, Data Mining Development Dr. Andreas Zekl Manager, Data Mining Development Page 2 Executive Summary Contents 2 Data mining in the
Data Mining + Business Intelligence. Integration, Design and Implementation
Data Mining + Business Intelligence Integration, Design and Implementation ABOUT ME Vijay Kotu Data, Business, Technology, Statistics BUSINESS INTELLIGENCE - Result Making data accessible Wider distribution
